오늘은 쿠버네티스 환경 중에서도 AWS EKS의 심장이라 할 수 있는 네트워크 아키텍처에 대해 심도 있게 알아본다.

보통 쿠버네티스의 네트워킹은 복잡하다. 하지만 AWS EKS는 AWS VPC CNI를 통해 파드의 네트워크를 VPC와 직접 통합하는 강력한 방식을 사용한다.

이번 글에서는 인프라 배포부터 VPC CNI의 한계와 극복, External DNS 연동, 그리고 차세대 라우팅 표준인 Gateway API까지 EKS 네트워킹 실습을 진행한다.

1. 실습 인프라 프로비저닝 (Terraform)

먼저 실습을 위한 EKS 클러스터를 테라폼으로 구축한다.

bash

# 코드 다운로드, 작업 디렉터리 이동
git clone https://github.com/gasida/aews.git
cd aews/2w
 
# 변수 지정
export TF_VAR_KeyName=$(aws ec2 describe-key-pairs --query "KeyPairs[].KeyName" --output text)
export TF_VAR_ssh_access_cidr=$(curl -s ipinfo.io/ip)/32
echo $TF_VAR_KeyName $TF_VAR_ssh_access_cidr
 
# 배포 : 12분 정도 소요
 
terraform init
terraform plan
nohup sh -c "terraform apply -auto-approve" > create.log 2>&1 &
tail -f create.log
 
 
# 자격증명 설정
terraform output -raw configure_kubectl
aws eks --region ap-northeast-2 update-kubeconfig --name myeks
 
aws eks --region ap-northeast-2 update-kubeconfig --name myeks
kubectl config rename-context $(cat ~/.kube/config | grep current-context | awk '{print $2}') myeks

1. `eks.tf`

bash

########################
# Provider Definitions #
########################
 
# AWS 공급자: 지정된 리전에서 AWS 리소스를 설정
provider "aws" {
  region = var.TargetRegion
}
 
 
########################
# Security Group Setup #
########################
 
# 보안 그룹: EKS 워커 노드용 보안 그룹 생성
resource "aws_security_group" "node_group_sg" {
  name        = "${var.ClusterBaseName}-node-group-sg"
  description = "Security group for EKS Node Group"
  vpc_id      = module.vpc.vpc_id
 
  tags = {
    Name = "${var.ClusterBaseName}-node-group-sg"
  }
}
 
# 보안 그룹 규칙: EKS 워커 노드로 접속 허용
resource "aws_security_group_rule" "allow_ssh" {
  type        = "ingress"
  from_port         = 0
  to_port           = 0
  protocol          = "-1"
  cidr_blocks = [
    var.ssh_access_cidr,
    var.VpcBlock
  ]
  security_group_id = aws_security_group.node_group_sg.id
}
 
 
########################
# EKS
########################
 
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
module "eks" {
  
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 21.0"
 
  name               = var.ClusterBaseName
  kubernetes_version = var.KubernetesVersion
 
  vpc_id = module.vpc.vpc_id
  subnet_ids = module.vpc.public_subnets
 
  enable_irsa = true
 
  endpoint_public_access = true
  endpoint_private_access = true
  # endpoint_public_access_cidrs = [
  #   var.ssh_access_cidr
  # ]
 
  # controlplane log
  enabled_log_types = []
 
  # Optional: Adds the current caller identity as an administrator via cluster access entry
  enable_cluster_creator_admin_permissions = true
 
  # EKS Managed Node Group(s)
  eks_managed_node_groups = {
    # 1st 노드 그룹
    primary = {
      name             = "${var.ClusterBaseName}-1nd-node-group"
      use_name_prefix  = false
      instance_types   = ["${var.WorkerNodeInstanceType}"]
      desired_size     = var.WorkerNodeCount
      max_size         = var.WorkerNodeCount + 2
      min_size         = var.WorkerNodeCount - 1
      disk_size        = var.WorkerNodeVolumesize
      subnets          = module.vpc.public_subnets
      key_name         = "${var.KeyName}"
      vpc_security_group_ids = [aws_security_group.node_group_sg.id]
      
      # node label
      labels = {
        tier = "primary"
      }
 
      # AL2023 전용 userdata 주입
      cloudinit_pre_nodeadm = [
        {
          content_type = "text/x-shellscript"
          content      = <<-EOT
            #!/bin/bash
            echo "Starting custom initialization..."
            dnf update -y
            dnf install -y tree bind-utils tcpdump nvme-cli links sysstat ipset htop
            echo "Custom initialization completed."
          EOT
        }
      ]
    }
 
    # 2nd 노드 그룹 (추가)
    # secondary = {
    #   name            = "${var.ClusterBaseName}-2nd-node-group"
    #   use_name_prefix = false
    
    #   instance_types  = ["c5.large"] 
    #   desired_size    = 1
    #   max_size        = 1
    #   min_size        = 1
      
    #   subnets          = module.vpc.public_subnets  # module.vpc.private_subnets
    #   key_name         = "${var.KeyName}"
    #   vpc_security_group_ids = [aws_security_group.node_group_sg.id]
      
    #   # node label
    #   labels = {
    #     tier = "secondary"
    #   }
 
    #   # AL2023 전용 userdata 주입
    #   cloudinit_pre_nodeadm = [
    #     {
    #       content_type = "text/x-shellscript"
    #       content      = <<-EOT
    #         #!/bin/bash
    #         echo "Starting custom initialization..."
    #         dnf update -y
    #         dnf install -y tree bind-utils tcpdump nvme-cli links sysstat ipset htop
    #         echo "Custom initialization completed."
    #       EOT
    #     }
    #   ]
    # }
 
  }
 
  # add-on
  addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
      before_compute = true
      configuration_values = jsonencode({
        env = {
          WARM_ENI_TARGET = "1" # 현재 ENI 외에 여유 ENI 1개를 항상 확보
          #WARM_IP_TARGET  = "5" # 현재 사용 중인 IP 외에 여유 IP 5개를 항상 유지, 설정 시 WARM_ENI_TARGET 무시됨
          #MINIMUM_IP_TARGET   = "10" # 노드 시작 시 최소 확보해야 할 IP 총량 10개
          #ENABLE_PREFIX_DELEGATION = "true" 
          #WARM_PREFIX_TARGET = "1" # PREFIX_DELEGATION 사용 시, 1개의 여유 대역(/28) 유지
        }
      })
    }
  }
 
  tags = {
    Environment = "cloudneta-lab"
    Terraform   = "true"
  }
 
}

배포된 eks.tf 코드를 유심히 보면 vpc-cni 애드온 설정 부분이 있다.

hcl

  # add-on
  addons = {
    vpc-cni = {
      most_recent = true
      before_compute = true
      configuration_values = jsonencode({
        env = {
          WARM_ENI_TARGET = "1" # 현재 ENI 외에 여유 ENI 1개를 항상 확보
          # WARM_IP_TARGET  = "5"
          # MINIMUM_IP_TARGET   = "10"
          # ENABLE_PREFIX_DELEGATION = "true" 
        }
      })
    }
  }

이 env 값들이 오늘 실습의 핵심 키워드다. 이 주석들을 풀고 변경해 가며 EKS 네트워킹의 동작 방식이 어떻게 변하는지 확인해 볼 예정이다.

2. `vars.tf`

클러스터 기본 정보, VPC 대역, 출력값을 정의하는 파일이다.

bash

variable "KeyName" {
  # aws ec2 describe-key-pairs --query "KeyPairs[].KeyName" --output text
  # export TF_VAR_KeyName=kp-gasida
  description = "Name of an existing EC2 KeyPair to enable SSH access to the instances."
  type        = string
}
 
variable "ssh_access_cidr" {
  # export TF_VAR_ssh_access_cidr=$(curl -s ipinfo.io/ip)/32
  description = "Allowed CIDR for SSH access"
  type        = string
}
 
variable "ClusterBaseName" {
  description = "Base name of the cluster."
  type        = string
  default     = "myeks"
}
 
variable "KubernetesVersion" {
  description = "Kubernetes version for the EKS cluster."
  type        = string
  default     = "1.34"
}
 
variable "WorkerNodeInstanceType" {
  description = "EC2 instance type for the worker nodes."
  type        = string
  default     = "t3.medium"
}
 
variable "WorkerNodeCount" {
  description = "Number of worker nodes."
  type        = number
  default     = 3
}
 
variable "WorkerNodeVolumesize" {
  description = "Volume size for worker nodes (in GiB)."
  type        = number
  default     = 30
}
 
variable "TargetRegion" {
  description = "AWS region where the resources will be created."
  type        = string
  default     = "ap-northeast-2"
}
 
variable "availability_zones" {
  description = "List of availability zones."
  type        = list(string)
  default     = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"]
}
 
variable "VpcBlock" {
  description = "CIDR block for the VPC."
  type        = string
  default     = "192.168.0.0/16"
}
 
variable "public_subnet_blocks" {
  description = "List of CIDR blocks for the public subnets."
  type        = list(string)
  default     = ["192.168.0.0/22", "192.168.4.0/22", "192.168.8.0/22"]
}
 
variable "private_subnet_blocks" {
  description = "List of CIDR blocks for the private subnets."
  type        = list(string)
  default     = ["192.168.12.0/22", "192.168.16.0/22", "192.168.20.0/22"]
}

`vpc.tf`

bash

########################
# VPC
########################
 
# VPC 모듈: 퍼블릭 및 프라이빗 서브넷을 포함하는 VPC를 생성
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/6.5.0
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~>6.5"
 
  name = "${var.ClusterBaseName}-VPC"
  cidr = var.VpcBlock
  azs  = var.availability_zones
 
  enable_dns_support   = true # DNS 서버 활성화
  enable_dns_hostnames = true # 인스턴스에 DNS 이름 부여
 
  public_subnets  = var.public_subnet_blocks
  private_subnets = var.private_subnet_blocks
 
  enable_nat_gateway = false # true
  single_nat_gateway = true
  one_nat_gateway_per_az = false
  
  manage_default_network_acl = false
 
  map_public_ip_on_launch = true
 
  igw_tags = {
    "Name" = "${var.ClusterBaseName}-IGW"
  }
 
  nat_gateway_tags = {
    "Name" = "${var.ClusterBaseName}-NAT"
  }
 
  public_subnet_tags = {
    "Name"                     = "${var.ClusterBaseName}-PublicSubnet"
    "kubernetes.io/role/elb"   = "1"
  }
 
  private_subnet_tags = {
    "Name"                             = "${var.ClusterBaseName}-PrivateSubnet"
    "kubernetes.io/role/internal-elb" = "1"
  }
 
  tags = {
    "Environment" = "cloudneta-lab"
  }
}

4. `outputs.tf`

bash

output "configure_kubectl" {
  description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig"
  value       = "aws eks --region ${var.TargetRegion} update-kubeconfig --name ${var.ClusterBaseName}"
}

해당 테라폼 파일을 살펴 보았고, 해당 테라폼으로 실습 환경을 구축한다.

EKS 배포 이후 기본 정보 확인

bash

# 클러스터 확인
kubectl cluster-info
eksctl get cluster
 
# 네임스페이스 default 변경 적용
kubens default
 
# 노드 정보 확인
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
kubectl get node -v=6
 
# 노드 라벨 확인
kubectl get node --show-labels
kubectl get node -l tier=primary
 
 
# 파드 정보 확인
kubectl get pod -A
kubectl get pdb -n kube-system
NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
coredns          N/A             1                 1                     28m
metrics-server   N/A             1                 1                     28m
 
# 관리형 노드 그룹 확인
aws eks describe-nodegroup --cluster-name myeks --nodegroup-name myeks-1nd-node-group | jq
 
# eks addon 확인
aws eks list-addons --cluster-name myeks | jq
eksctl get addon --cluster myeks 
NAME            VERSION                 STATUS  ISSUES  IAMROLE UPDATE AVAILABLE        CONFIGURATION VALUES            NAMESPACE POD IDENTITY ASSOCIATION ROLES
coredns         v1.13.2-eksbuild.3      ACTIVE  0                                                                       kube-system
kube-proxy      v1.34.5-eksbuild.2      ACTIVE  0                                                                       kube-system
vpc-cni         v1.21.1-eksbuild.5      ACTIVE  0                                       {"env":{"WARM_ENI_TARGET":"1"}} kube-system

출력 결과를 보면 coredns, kube-proxy, vpc-cni가 기본 애드온으로 설치된 것을 확인할 수 있다.

2. AWS VPC CNI의 핵심 개념

AWS VPC CNI

일반적인 쿠버네티스 CNI(Flannel, Calico 등)는 노드 네트워크(Host IP)와 파드 네트워크(Pod IP) 대역이 다르다.

따라서 통신을 위해 오버레이(Overlay) 네트워크를 구성하고 패킷을 캡슐화해야 한다.

하지만 AWS VPC CNI는 다르다. 파드가 VPC의 IP를 직접 할당받는다. 즉, 파드의 IP 대역과 워커 노드의 IP 대역이 동일하며 직접 통신이 가능하다. 패킷 캡슐화 과정이 생략되므로 네트워크 오버헤드가 없다.

[참고 사항]

K8s 생태계에서 Kube-proxy의 iptables 모드는 성능 한계로 인해 점차 nftables나 IPVS 혹은 eBPF(Cilium 등)로 전환되는 추세다.

AWS 환경에서도 CNI의 근간을 이해하는 것은 향후 최적화를 위해 필수적이다.

VPC CNI 3가지 모드

1. 보조 IP 모드(Secondary IP Mode)

기본이 되는 default 방식이다. 각 ENI 가 IP 주소를 가지고 오고 pod 에 바인딩 한다.

EC2 인스턴스 타입마다 붙일 수 있는 ENI 개수와 ENI당 IP 개수가 정해져 있습니다. 예를 들어 t3.medium은 ENI 3개, ENI당 IP 6개를 가집니다. 하나는 노드가 쓰니 파드는 총 17개(3 * (6-1) + 2) 정도 띄울 수 있습니다.

2. 접두사 위임 모드 (Prefix Delegation)

파드 밀도 문제를 해결하기 위해 등장한 방식입니다. IP를 하나씩 빌려오는 게 아니라, /28 대역을 통째로 할당합니다.

3. 사용자 지정 네트워킹 (Custom Networking)

노드가 사용하는 IP 대역과 파드가 사용하는 IP 대역을 아예 분리하는 방식이다.

VPC에 보조 CIDR(Secondary CIDR)을 추가하고, 파드 전용 서브넷을 만듭니다. 노드는 원래 서브넷 IP를 쓰고, 파드는 새로 만든 전용 서브넷 IP를 쓰게 된다.

노드 기본 네트워크 정보 확인

bash

# EC2 ENI IP 확인
aws ec2 describe-instances --query "Reservations[*].Instances[*].{PublicIPAdd:PublicIpAddress,PrivateIPAdd:PrivateIpAddress,InstanceName:Tags[?Key=='Name']|[0].Value,Status:State.Name}" --filters Name=instance-state-name,Values=running --output table
 
# 아래 IP는 각자 실습 환경에 따라 사용
N1=3.34.180.78
N2=13.125.198.62
N3=15.164.213.156
 
# 워커 노드 SSH 접속
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh -o StrictHostKeyChecking=no ec2-user@$i hostname; echo; done

다음과 같이 통신이 되면 성공이다.

나같은 경우는 바로 되지 않아서 아래 명령어를 실행했다. 해당 명령어는 세션이 종료되면 다시 연결해야 한다.

eval $(ssh-agent -s) ssh-add ~/.ssh/eks.pem

네트워크 기본 정보

bash

# 파드 상세 정보 확인
kubectl get daemonset aws-node --namespace kube-system -owide
kubectl describe daemonset aws-node --namespace kube-system
 
# kube-proxy config 확인 : 모드 iptables 사용
kubectl describe cm -n kube-system kube-proxy-config
 
kubectl describe cm -n kube-system kube-proxy-config | grep iptables: -A5
 
# aws-node 데몬셋 env 확인
kubectl get ds aws-node -n kube-system -o json | jq '.spec.template.spec.containers[0].env'
 
# 노드 IP 확인
aws ec2 describe-instances --query "Reservations[*].Instances[*].{PublicIPAdd:PublicIpAddress,PrivateIPAdd:PrivateIpAddress,InstanceName:Tags[?Key=='Name']|[0].Value,Status:State.Name}" --filters Name=instance-state-name,Values=running --output table
 
# 파드 IP 확인
kubectl get pod -n kube-system -o=custom-columns=NAME:.metadata.name,IP:.status.podIP,STATUS:.status.phase
 
# 파드 이름 확인
kubectl get pod -A -o name
 
# 파드 갯수 확인
kubectl get pod -A -o name | wc -l

K8S IPVS 지원을 중단한다고 하여 nftables 를 사용 권장 한다고 한다. 아래 명령어로 모든 IP 관련 정보를 확인 해 본다.

bash

# cni log 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i tree /var/log/aws-routed-eni ; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/plugin.log | jq ; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/ipamd.log | jq ; echo; done
 
# 네트워크 정보 확인 : eniY는 pod network 네임스페이스와 veth pair
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -br -c addr; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -c addr; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -c route; echo; done
 
ssh ec2-user@$N1 sudo iptables -t nat -S
ssh ec2-user@$N1 sudo iptables -t nat -L -n -v

다음과 같이 1개의 노드에는 eni 가 한개가 없다. 이유는 현재 파드가 맨 아래 노드는 없다. 호스트 ip 를 쓰는 것은 ip 개수를 세는 것에서는 빠지게 된다.

다른 노드는 파드가 올라가 있어서 보조 ip 들이 미리 확보된 것이다.

다음과 같이 1개의 노드엔 다른 ip를 가진 pod 이 없다.

Network multitool 을 이용한 테스트

bash

# [터미널1~3] 노드 모니터링
ssh ec2-user@$N1
watch -d "ip link | egrep 'ens|eni' ;echo;echo "[ROUTE TABLE]"; route -n | grep eni"
 
ssh ec2-user@$N2
watch -d "ip link | egrep 'ens|eni' ;echo;echo "[ROUTE TABLE]"; route -n | grep eni"
 
ssh ec2-user@$N3
watch -d "ip link | egrep 'ens|eni' ;echo;echo "[ROUTE TABLE]"; route -n | grep eni"
 
# Network-Multitool 디플로이먼트 생성
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: netshoot-pod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: netshoot-pod
  template:
    metadata:
      labels:
        app: netshoot-pod
    spec:
      containers:
      - name: netshoot-pod
        image: praqma/network-multitool
        ports:
        - containerPort: 80
        - containerPort: 443
        env:
        - name: HTTP_PORT
          value: "80"
        - name: HTTPS_PORT
          value: "443"
      terminationGracePeriodSeconds: 0
EOF
 
# 파드 이름 변수 지정
PODNAME1=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[0].metadata.name}')
PODNAME2=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[1].metadata.name}')
PODNAME3=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[2].metadata.name}')
echo $PODNAME1 $PODNAME2 $PODNAME3
 
# 파드 확인
kubectl get pod -o wide
kubectl get pod -o=custom-columns=NAME:.metadata.name,IP:.status.podIP
 
# 노드에 라우팅 정보 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -c route; echo; done

파드간 통신

bash

# 파드 IP 변수 지정
PODIP1=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[0].status.podIP}')
PODIP2=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[1].status.podIP}')
PODIP3=$(kubectl get pod -l app=netshoot-pod -o jsonpath='{.items[2].status.podIP}')
echo $PODIP1 $PODIP2 $PODIP3
 
# 파드1 Shell 에서 파드2로 ping 테스트
kubectl exec -it $PODNAME1 -- ping -c 2 $PODIP2
kubectl exec -it $PODNAME1 -- curl -s http://$PODIP2
kubectl exec -it $PODNAME1 -- curl -sk https://$PODIP2
 
# 파드2 Shell 에서 파드3로 ping 테스트
kubectl exec -it $PODNAME2 -- ping -c 2 $PODIP3
 
# 파드3 Shell 에서 파드1로 ping 테스트
kubectl exec -it $PODNAME3 -- ping -c 2 $PODIP1
 
 
# 워커 노드 EC2 : TCPDUMP 확인
## For Pod to external (outside VPC) traffic, we will program iptables to SNAT using Primary IP address on the Primary ENI.
sudo tcpdump -i any -nn icmp
sudo tcpdump -i ens5 -nn icmp
sudo tcpdump -i ens6 -nn icmp
sudo tcpdump -i eniYYYYYYYY -nn icmp
 
[워커 노드1]
# routing policy database management 확인
ip rule
 
# routing table management 확인
ip route show table local
ip route show table main
ip route show table 2

보통 파드간 통신은 노드의 ip 로 오버레이 되서 나가게 되는데, 해당 경우는 실제 pod 의 ip 로 바로 나가게 된다.

파드에서 외부 통신

bash

# pod-1 Shell 에서 외부로 ping
kubectl exec -it $PODNAME1 -- ping -c 1 www.google.com
kubectl exec -it $PODNAME1 -- ping -i 0.1 www.google.com
kubectl exec -it $PODNAME1 -- ping -i 0.1 8.8.8.8
 
# 워커 노드 EC2 : TCPDUMP 확인
sudo tcpdump -i any -nn icmp
sudo tcpdump -i ens5 -nn icmp
 
# 퍼블릭IP 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i curl -s ipinfo.io/ip; echo; echo; done
 
# 작업용 EC2 : pod-1 Shell 에서 외부 접속 확인 - 공인IP는 어떤 주소인가?
## The right way to check the weather - 링크
for i in $PODNAME1 $PODNAME2 $PODNAME3; do echo ">> Pod : $i <<"; kubectl exec -it $i -- curl -s ipinfo.io/ip; echo; echo; done
kubectl exec -it $PODNAME1 -- curl -s wttr.in/seoul
kubectl exec -it $PODNAME1 -- curl -s wttr.in/seoul?format=3
kubectl exec -it $PODNAME1 -- curl -s wttr.in/Moon
kubectl exec -it $PODNAME1 -- curl -s wttr.in/:help
 
 
# 워커 노드 EC2
## 출력된 결과를 보고 어떻게 빠져나가는지 고민해보자!
ip rule
ip route show table main
sudo iptables -L -n -v -t nat
sudo iptables -t nat -S
 
# 파드가 외부와 통신시에는 아래 처럼 'AWS-SNAT-CHAIN-0' 룰(rule)에 의해서 SNAT 되어서 외부와 통신!
# 참고로 뒤 IP는 eth0(ENI 첫번째)의 IP 주소이다
# --random-fully 동작 - 링크1  링크2
sudo iptables -t nat -S | grep 'A AWS-SNAT-CHAIN'
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j RETURN
-A AWS-SNAT-CHAIN-0 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.1.251 --random-fully
 
## 아래 'mark 0x4000/0x4000' 매칭되지 않아서 RETURN 됨!
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
...
 
# 카운트 확인 시 AWS-SNAT-CHAIN-0에 매칭되어, 목적지가 192.168.0.0/16 아니고 외부 빠져나갈때 SNAT 192.168.1.251(EC2 노드1 IP) 변경되어 나간다!
sudo iptables -t filter --zero; sudo iptables -t nat --zero; sudo iptables -t mangle --zero; sudo iptables -t raw --zero
watch -d 'sudo iptables -v --numeric --table nat --list AWS-SNAT-CHAIN-0; echo ; sudo iptables -v --numeric --table nat --list KUBE-POSTROUTING; echo ; sudo iptables -v --numeric --table nat --list POSTROUTING'
 
# conntrack 확인 : EC2 메타데이터 주소(169.254.169.254) 제외 출력
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo conntrack -L -n |grep -v '169.254.169'; echo; done
conntrack v1.4.5 (conntrack-tools): 
icmp     1 28 src=172.30.66.58 dst=8.8.8.8 type=8 code=0 id=34392 src=8.8.8.8 dst=172.30.85.242 type=0 code=0 id=50705 mark=128 use=1
tcp      6 23 TIME_WAIT src=172.30.66.58 dst=34.117.59.81 sport=58144 dport=80 src=34.117.59.81 dst=172.30.85.242 sport=80 dport=44768 [ASSURED] mark=128 use=1

외부 통신 시 노드의 외부 공인 IP 를 통해 나가게 된다.

AWS VPC CNI 설정 변경

eks.tf 파일을 수정한다

bash

  WARM_IP_TARGET  = "5" # 현재 사용 중인 IP 외에 여유 IP 5개를 항상 유지, 설정 시 WARM_ENI_TARGET 무시됨
          MINIMUM_IP_TARGET   = "10" # 노드 시작 시 최소 확보해야 할 IP 총량 10개

상단의 주석을 조절하여 위에 옵션을 활성화 해 준다.

다음과 같이 설정이 적용 된 것을 볼 수 있다.

명령어로도 볼 수 있는데 아래 명령어들을 이용해서 확인 해 볼 수 있다.

bash

# 파드 재생성 확인
kubectl get pod -n kube-system -l k8s-app=aws-node
 
# addon 확인
eksctl get addon --cluster myeks
 
# aws-node DaemonSet의 env 확인
kubectl get ds aws-node -n kube-system -o json | jq '.spec.template.spec.containers[0].env'
kubectl describe ds aws-node -n kube-system | grep -E "WARM_IP_TARGET|MINIMUM_IP_TARGET"
 
# 노드 정보 확인 : (hostNetwork 제외) 파드가 없는 노드에도 ENI 추가 확인!
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -c addr; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo ip -c route; echo; done
 
 
# cni log 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i tree /var/log/aws-routed-eni ; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/plugin.log | jq ; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/ipamd.log | jq ; echo; done
 
# IpamD debugging commands  https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i curl -s http://localhost:61679/v1/enis | jq; echo; done

최소 ip 를 10 개씩 가지고 있기 때문에 파드가 없어도 10개를 가지고 있는 것들을 볼 수 있다.

노드에 파드 생성 갯수 제한

먼저 시각적 확인을 위해 kube-ops-view를 노드 IP로 생성한다.

bash

# kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=NodePort,service.main.ports.http.nodePort=30000 --set env.TZ="Asia/Seoul" --namespace kube-system
 
# 확인
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view
 
# kube-ops-view 접속
open "http://$N1:30000/#scale=1.5"
open "http://$N1:30000/#scale=1.3"

현재는 보조 IP 모드(Secondary IP Mode)이기 때문에, EC2 인스턴스 타입에 따라 노드당 생성할 수 있는 파드 수(IP 수)가 제한되어 있다.

아래 명령어를 입력하면 인스턴스 타입별 ENI 및 IP 정보를 확인할 수 있다. aws ec2 describe-instance-types --filters Name=instance-type,Values=t3.\* \ --query "InstanceTypes[].{Type: InstanceType, MaxENI: NetworkInfo.MaximumNetworkInterfaces, IPv4addr: NetworkInfo.Ipv4AddressesPerInterface}" \ --output table

파드 사용 가능 갯수 계산 공식: ((MaxENI * (IPv4addr - 1)) + 2)

t3.medium의 경우: (3 * (6 - 1)) + 2 = 17개

여기서 aws-node와 kube-proxy 파드는 호스트 네트워크를 사용하여 보조 IP를 쓰지 않으므로, 이 2개를 제외하면 실제 일반 파드는 최대 15개만 배포할 수 있다.

명령어를 입력 시 아래와 같은 형태로 입력이 가능하다

파드 사용 가능 계산 예시 : aws-node 와 kube-proxy 파드는 host-networking 사용으로 IP 2개 남음 ((MaxENI * (IPv4addr-1)) + 2) t3.medium 경우 : ((3 * (6 - 1) + 2 ) = 17개 >> aws-node 와 kube-proxy 2개 제외하면 15개

최대 파드 생성 및 확인

bash

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80
EOF

파드를 50개로 스케일 아웃해 본다. kubectl scale deployment nginx-deployment --replicas=50

50개로 늘리면 파드가 다 생성되지 못하고 Pending 상태에 빠져 있는 모습을 볼 수 있다.

Pending 된 파드의 describe를 확인해 보면 IP 자원 부족 에러가 발생했다.

kubectl delete deploy nginx-deployment 로 삭제 해 준다

4. 파드 배포 한계 극복: 접두사 위임(Prefix Delegation) 모드

기본 모드(Secondary IP)의 치명적인 단점은 파드 밀집도(Pod Density) 한계다. 예를 들어 t3.medium 인스턴스의 경우 파드 최대 생성 갯수는 다음과 같이 계산된다.

공식: ((MaxENI * (IPv4addr - 1)) + 2)
t3.medium = 3개의 ENI * (ENI당 6개 IP - 1) + 2 = 17개
여기서 aws-node와 kube-proxy가 2개를 차지하므로, 실제 유저 파드는 15개밖에 못 띄운다.

주의사항: 접두사 위임 모드는 Nitro 하이퍼바이저 기반의 인스턴스(t3, m5, c5 등)에서만 동작한다. 아래 명령어를 통해 확인 해 본다

bash

(⎈|myeks:default) (⎈|myeks:default) ubuntu@chan:~/aews/2w$ aws ec2 describe-instance-types --instance-types t3.medium --query "InstanceTypes[].Hypervisor"
[
    "nitro"
]

다시 eks.tf 로 가서 이 부분을 수정한다.

bash

    vpc-cni = {
      most_recent = true
      before_compute = true
      configuration_values = jsonencode({
        env = {
          #WARM_ENI_TARGET = "1" # 현재 ENI 외에 여유 ENI 1개를 항상 확보
          #WARM_IP_TARGET  = "5" # 현재 사용 중인 IP 외에 여유 IP 5개를 항상 유지, 설정 시 WARM_ENI_TARGET 무시됨
          #MINIMUM_IP_TARGET   = "10" # 노드 시작 시 최소 확보해야 할 IP 총량 10개
          ENABLE_PREFIX_DELEGATION = "true"
          #WARM_PREFIX_TARGET = "1" # PREFIX_DELEGATION 사용 시, 1개의 여유 대역(/28) 유지
        }
      })
    }

변경 이후 테라폼을 적용하고, 기존 시스템 파드들도 새 설정을 반영하도록 재기동해 둔다.

bash

terraform plan
terraform apply -auto-approve
 
# 기존 파드들도 위 설정 적용을 위해 재기동 해두자!
kubectl rollout restart -n kube-system deployment coredns
kubectl rollout restart -n kube-system deployment kube-ops-view

확인해 보면 아래와 같이 IP가 아닌 Prefix 단위로 대역이 할당된 것을 볼 수 있다.

bash

# 파드 재생성 확인
kubectl get pod -n kube-system -l k8s-app=aws-node
 
# addon 확인
eksctl get addon --cluster myeks
 
# aws-node DaemonSet의 env 확인
kubectl get ds aws-node -n kube-system -o json | jq '.spec.template.spec.containers[0].env'
 
 
# IPv4 접두사 위임 확인
aws ec2 describe-instances --filters "Name=tag-key,Values=eks:cluster-name" "Name=tag-value,Values=myeks" \
  --query 'Reservations[*].Instances[].{InstanceId: InstanceId, Prefixes: NetworkInterfaces[].Ipv4Prefixes[]}' | jq
 
 
# cni log 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i tree /var/log/aws-routed-eni; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/plugin.log | jq ; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /var/log/aws-routed-eni/ipamd.log | jq ; echo; done
 
 
# IpamD debugging commands  https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i curl -s http://localhost:61679/v1/enis | jq; echo; done

테스트를 위해 다시 파드 50개를 올려 본다.

bash

watch -d 'kubectl get pods -o wide'
 
# 터미널2
## 디플로이먼트 생성
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 15
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80
EOF

kubectl scale deployment nginx-deployment --replicas=50

테스트를 위해 다시 파드 50개를 올려 본다.

Prefix Delegation을 켜서 IP 풀을 넉넉히 확보했는데도 아직 파드가 멈춰 있는 것을 볼 수 있다.

이유는 노드의 쿠버네티스 에이전트인 kubelet 설정 파일에 인스턴스 타입 기본 한계치인 17이 maxPods로 하드코딩되어 있기 때문이다.

이를 해결하려면 kubelet 설정을 변경해야 한다.

bash

# 기본 정보 확인
cat /etc/kubernetes/kubelet/config.json | grep maxPods
cat /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf | grep maxPods
 
# sed 로 변경 : 기존 17 -> 변경 40
sudo sed -i 's/"maxPods": 17/"maxPods": 50/g' /etc/kubernetes/kubelet/config.json
sudo sed -i 's/"maxPods": 17/"maxPods": 50/g' /etc/kubernetes/kubelet/config.json.d/40-nodeadm.conf 
 
# 적용
sudo systemctl restart kubelet

임시로 최대 maxPods 를 수정하고 다시 올려 본다.

100개로 올렸는데도 동작 하는 것을 볼 수 있다. 100개 이상의 경우 ip 추가 할당의 이슈로 실패한다.

5. Service & AWS Load Balancer Controller 연동

외부 트래픽을 EKS 클러스터 내부로 유입시키기 위해 필수적인 AWS Load Balancer Controller(LBC)를 셋팅해 본다.

AWS LBC with IRSA 설치

AWS 에서 컨트롤 하려면 IAM 권한이 있어야 한다.

bash

# OIDC Provider
aws iam list-open-id-connect-providers
 
aws eks describe-cluster --name myeks \
  --query "cluster.identity.oidc.issuer" \
  --output text
https://oidc.eks.ap-northeast-2.amazonaws.com/id/1BB80004FADD0C9E59C6641F386155BD
 
# public subnet 찾기
aws ec2 describe-subnets --filters "Name=tag:kubernetes.io/role/elb,Values=1" --output table
 
# private subnet 찾기
aws ec2 describe-subnets --filters "Name=tag:kubernetes.io/role/internal-elb,Values=1" --output table

bash

# IRSA 생성 : cloudforamtion 를 통해 IAM Role 생성
CLUSTER_NAME=myeks
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
kubectl get serviceaccounts -n kube-system aws-load-balancer-controller
 
eksctl create iamserviceaccount \
  --cluster=$CLUSTER_NAME \
  --namespace=kube-system \
  --name=aws-load-balancer-controller \
  --attach-policy-arn=arn:aws:iam::$ACCOUNT_ID:policy/AWSLoadBalancerControllerIAMPolicy \
  --override-existing-serviceaccounts \
  --approve
 
# 확인
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
 
# k8s 에 SA 확인
# Inspecting the newly created Kubernetes Service Account, we can see the role we want it to assume in our pod.
kubectl get serviceaccounts -n kube-system aws-load-balancer-controller -o yaml

bash

# Helm Chart Repository 추가
helm repo add eks https://aws.github.io/eks-charts
helm repo update
 
# Helm Chart - AWS Load Balancer Controller 설치
# https://github.com/aws/eks-charts/blob/master/stable/aws-load-balancer-controller/values.yaml
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system \
  --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set serviceAccount.create=false
 
# 확인
helm list -n kube-system
 
# 파드 상태 실패 확인
kubectl get pod -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
 
 
# 로그 확인 : vpc id 정보 획득 실패!
kubectl logs -n kube-system deployment/aws-load-balancer-controller

bash

{"level":"error","ts":"2026-03-23T15:26:17Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to get VPC ID: failed to fetch VPC ID from instance metadata: error in fetching vpc id through ec2 metadata: get mac metadata: operation error ec2imds: GetMetadata, canceled, context deadline exceeded"}

IRSA(IAM Roles for Service Accounts)를 생성하고 Helm을 통해 설치한다. 하지만 설치 직후 컨트롤러 파드의 로그를 까보면 위와 같은 에러가 발생할 수 있다.

IMDSv2 Hop Limit 문제 워커 노드인 EC2는 메타데이터(IMDSv2)에 접근할 때 기본 네트워크 홉(Hop) 제한이 1로 설정되어 있다. 하지만 파드는 노드 위에 떠 있는 격리된 네트워크 공간이므로 통신하려면 홉이 2가 되어야 한다. AWS 콘솔이나 CLI에서 해당 EC2 인스턴스의 메타데이터 응답 홉 제한을 2로 변경해주면 파드가 정상적으로 Running 상태로 올라온다.

설정의 경우 아래 사진과 같은 순서대로 2로 변경 해 준다.

bash

 kubectl get pod -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
NAME                                            READY   STATUS    RESTARTS        AGE
aws-load-balancer-controller-7875649799-24hph   1/1     Running   5 (2m37s ago)   4m39s
aws-load-balancer-controller-7875649799-2zcnv   1/1     Running   5 (2m43s ago)   4m39s

정상적으로 컨트롤러가 구동되면 NLB와 ALB를 생성할 수 있다.

bash

# 디플로이먼트 & 서비스 생성
cat << EOF > echo-service-nlb.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-echo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: deploy-websrv
  template:
    metadata:
      labels:
        app: deploy-websrv
    spec:
      terminationGracePeriodSeconds: 0
      containers:
      - name: aews-websrv
        image: k8s.gcr.io/echoserver:1.10  # open https://registry.k8s.io/v2/echoserver/tags/list
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: svc-nlb-ip-type
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "8080"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  allocateLoadBalancerNodePorts: false  # K8s 1.24+ 무의미한 NodePort 할당 차단
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: LoadBalancer
  selector:
    app: deploy-websrv
EOF
 
kubectl apply -f echo-service-nlb.yaml

target-type: ip를 사용하면 ALB가 노드의 NodePort를 거치지 않고, VPC CNI가 부여한 파드 IP로 직접 트래픽을 꽂아준다.

불필요한 네트워크 홉을 줄이는 핵심 설정이다.

kubectl get svc svc-nlb-ip-type -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' | awk '{ print "Pod Web URL = http://"$1 }' 코드를 써서 url 을 획득한다. 해당 형태로 실제 파드와 통신이 되는 것을 확인할 수 있다.

kubectl describe deploy -n kube-system aws-load-balancer-controller | grep -i 'Service Account' 를 입력 시 Service Account: aws-load-balancer-controller 의 권한이 있어서 ELB 를 컨트롤 할 수 있는 것을 알 수 있다.

Ingress (L7 : HTTP)

bash

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: game-2048
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: game-2048
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 2
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: public.ecr.aws/l6m2t8p7/docker-2048:latest
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: game-2048
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80
EOF
 
# 생성 확인
kubectl get ingressclass
kubectl get ingress,svc,ep,pod -n game-2048
kubectl get-all -n game-2048
kubectl get targetgroupbindings -n game-2048
 
# ALB 생성 확인
aws elbv2 describe-load-balancers --query 'LoadBalancers[?contains(LoadBalancerName, `k8s-game2048`) == `true`]' | jq
ALB_ARN=$(aws elbv2 describe-load-balancers --query 'LoadBalancers[?contains(LoadBalancerName, `k8s-game2048`) == `true`].LoadBalancerArn' | jq -r '.[0]')
aws elbv2 describe-target-groups --load-balancer-arn $ALB_ARN
TARGET_GROUP_ARN=$(aws elbv2 describe-target-groups --load-balancer-arn $ALB_ARN | jq -r '.TargetGroups[0].TargetGroupArn')
aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP_ARN | jq
 
# Ingress 확인
kubectl describe ingress -n game-2048 ingress-2048
kubectl get ingress -n game-2048 ingress-2048 -o jsonpath="{.status.loadBalancer.ingress[*].hostname}{'\n'}"
 
# 게임 접속 : ALB 주소로 웹 접속
kubectl get ingress -n game-2048 ingress-2048 -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' | awk '{ print "Game URL = http://"$1 }'
 
# 파드 IP 확인
kubectl get pod -n game-2048 -owide

서비스를 배포하고 배포가 되었는지 확인 해 본다.

ALB Endpoint 로 접속 시 배포한 게임이 나와 있는 것을 볼 수 있다.

6. External DNS 연동 (도메인 자동화)

서비스나 인그레스가 배포될 때 생성되는 난해한 ALB 도메인(ex: k8s-default-...elb.amazonaws.com) 대신, 내가 보유한 도메인으로 Route 53 A 레코드를 자동 생성해주는 것이 External DNS다.

External DNS 를 설치한다. IRSA 부터 셋팅한다.

bash

# 정책 파일 작성
cat << EOF > externaldns_controller_policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets",
        "route53:ListTagsForResources"
      ],
      "Resource": [
        "arn:aws:route53:::hostedzone/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ListHostedZones"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
EOF
 
# IAM 정책 생성
aws iam create-policy \
  --policy-name ExternalDNSControllerPolicy \
  --policy-document file://externaldns_controller_policy.json
 
# 확인
ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
aws iam get-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/ExternalDNSControllerPolicy | jq
 
 
# IRSA 생성 : cloudforamtion 를 통해 IAM Role 생성
CLUSTER_NAME=myeks
 
eksctl create iamserviceaccount \
  --cluster=$CLUSTER_NAME \
  --namespace=kube-system \
  --name=external-dns \
  --attach-policy-arn=arn:aws:iam::$ACCOUNT_ID:policy/ExternalDNSControllerPolicy \
  --override-existing-serviceaccounts \
  --approve
 
# 확인
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
 
# k8s 에 SA 확인
# Inspecting the newly created Kubernetes Service Account, we can see the role we want it to assume in our pod.
kubectl get serviceaccounts -n kube-system external-dns -o yaml

External DNS 설치

bash

# 자신의 도메인 변수 지정 
MyDomain=devchanki.com
 
# 설정 파일 작성
cat << EOF > external-dns-values.yaml
provider: aws
 
# 위에서 생성한 ServiceAccount와 연동
serviceAccount:
  create: false
  name: external-dns
 
# 필터링 설정 (보안상 권장)
# 특정 도메인만 관리하도록 제한 (예: example.com)
domainFilters:
  - $MyDomain
 
# 레코드 업데이트 정책
# sync: 쿠버네티스에서 삭제되면 Route 53에서도 삭제 (주의 필요)
# upsert-only: 생성/수정만 하고 삭제는 수동으로 (안전함)
policy: sync
 
# 리소스 감지 대상
sources:
  - service
  - ingress
 
# (선택) 텍스트 레코드에 식별자 추가 (여러 클러스터가 동일 도메인 관리 시 충돌 방지)
txtOwnerId: "stduy-myeks-cluster"
 
registry: txt
 
# 로그 레벨
logLevel: info
EOF
 
 
# Helm 레포지토리 추가 및 업데이트
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
helm repo update
 
# 차트 설치
helm install external-dns external-dns/external-dns \
  -n kube-system \
  -f external-dns-values.yaml
 
# 확인
helm list -n kube-system
kubectl get pod -l app.kubernetes.io/name=external-dns -n kube-system
 
# 로그 모니터링
kubectl logs deploy/external-dns -n kube-system -f

나는 devchanki.com 도메인이 있어서 설정하였다.

bash

# 테트리스 디플로이먼트 배포
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tetris
  labels:
    app: tetris
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tetris
  template:
    metadata:
      labels:
        app: tetris
    spec:
      containers:
      - name: tetris
        image: bsord/tetris
---
apiVersion: v1
kind: Service
metadata:
  name: tetris
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
    #service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "80"
spec:
  selector:
    app: tetris
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  type: LoadBalancer
EOF
 
# NLB에 ExternanDNS 로 도메인 연결
kubectl annotate service tetris "external-dns.alpha.kubernetes.io/hostname=tetris.$MyDomain"
while true; do aws route53 list-resource-record-sets --hosted-zone-id "${MyDnzHostedZoneId}" --query "ResourceRecordSets[?Type == 'A']" | jq ; date ; echo ; sleep 1; done
 
# Route53에 A레코드 확인
aws route53 list-resource-record-sets --hosted-zone-id "${MyDnzHostedZoneId}" --query "ResourceRecordSets[?Type == 'A']" | jq
 
# 확인
dig +short tetris.$MyDomain @8.8.8.8
dig +short tetris.$MyDomain
 
# 도메인 체크
echo -e "My Domain Checker Site1 = https://www.whatsmydns.net/#A/tetris.$MyDomain"
echo -e "My Domain Checker Site2 = https://dnschecker.org/#A/tetris.$MyDomain"
 
# 웹 접속 주소 확인 및 접속
echo -e "Tetris Game URL = http://tetris.$MyDomain"

도메인 전파 이슈에 대하여 이번 실습 중 devchanki.com 도메인 연동이 즉각적으로 되지 않는 현상이 있었다. 이는 도메인 네임서버(NS)를 Cloudflare로 이전해 둔 상태였기 때문이다.

AWS Route 53 쪽에 A 레코드는 정상적으로 자동 생성되었으나, 실제 글로벌 DNS 전파는 상위 네임서버인 Cloudflare에서 쿼리를 넘겨주지 않아 발생한 문제다.

해결책: 타 DNS 제공자(Cloudflare 등)를 메인으로 사용할 경우, 서브도메인(예: *.aws.devchanki.com)의 NS 레코드를 Route 53 Hosted Zone으로 위임(Delegation) 설정해두면 완벽하게 동작한다고 한다.

kubectl delete deploy,svc tetris 로 제거 이후, 실제로 AWS Route53 에서도 시간이 지난 후 삭제 되는 것을 확인 하였다.

Gateway API

Ingress API는 지난 몇 년간 잘 사용되었지만, 어노테이션이 너무 길어지고 개발자와 인프라 관리자의 역할 분리가 어렵다는 단점이 있었다. 이를 해결하기 위해 등장한 쿠버네티스 차세대 라우팅 표준이 바로 Gateway API다.

AWS LBC v2.13.0 이상부터 Gateway API를 본격 지원한다.

Gateway API는 리소스가 명확히 분리된다.

GatewayClass: 인프라 관리자가 정의 (어떤 컨트롤러를 쓸 것인가)
Gateway: 클라우드 관리자가 정의 (L4/L7 로드밸런서 실체 생성)
HTTPRoute: 앱 개발자가 정의 (라우팅 룰, 백엔드 연결)

1) Gateway API CRD 및 LBC 전용 CRD 설치

Gateway API는 기본적으로 쿠버네티스 코어에 내장되어 있지 않으므로, 표준 CRD와 AWS LBC 전용 CRD를 먼저 설치해야 한다.

bash

# LBC > v2.13.0 버전 이상
kubectl describe pod -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller | grep Image: | uniq
    Image:         public.ecr.aws/eks/aws-load-balancer-controller:v3.1.0
 
 
# Installation of Gateway API CRDs # --server-side=true
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml     # [REQUIRED] # Standard Gateway API CRDs
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/experimental-install.yaml # [OPTIONAL: Used for L4 Routes] # Experimental Gateway API CRDs
 
kubectl get crd  | grep gateway.networking
backendtlspolicies.gateway.networking.k8s.io          2026-03-19T08:37:54Z
gatewayclasses.gateway.networking.k8s.io              2026-03-19T08:37:03Z
gateways.gateway.networking.k8s.io                    2026-03-19T08:37:04Z
grpcroutes.gateway.networking.k8s.io                  2026-03-19T08:37:04Z
httproutes.gateway.networking.k8s.io                  2026-03-19T08:37:04Z
referencegrants.gateway.networking.k8s.io             2026-03-19T08:37:05Z
tcproutes.gateway.networking.k8s.io                   2026-03-19T08:37:55Z
tlsroutes.gateway.networking.k8s.io                   2026-03-19T08:37:56Z
udproutes.gateway.networking.k8s.io                   2026-03-19T08:37:56Z
xbackendtrafficpolicies.gateway.networking.x-k8s.io   2026-03-19T08:37:56Z
xlistenersets.gateway.networking.x-k8s.io             2026-03-19T08:37:56Z
 
kubectl api-resources | grep gateway.networking
gatewayclasses                      gc                gateway.networking.k8s.io/v1           false        GatewayClass
gateways                            gtw               gateway.networking.k8s.io/v1           true         Gateway
grpcroutes                                            gateway.networking.k8s.io/v1           true         GRPCRoute
httproutes                                            gateway.networking.k8s.io/v1           true         HTTPRoute
referencegrants                     refgrant          gateway.networking.k8s.io/v1beta1      true         ReferenceGrant
tcproutes                                             gateway.networking.k8s.io/v1alpha2     true         TCPRoute
tlsroutes                                             gateway.networking.k8s.io/v1alpha2     true         TLSRoute
udproutes                                             gateway.networking.k8s.io/v1alpha2     true         UDPRoute
backendtlspolicies                  btlspolicy        gateway.networking.k8s.io/v1alpha3     true         BackendTLSPolicy
xbackendtrafficpolicies             xbtrafficpolicy   gateway.networking.x-k8s.io/v1alpha1   true         XBackendTrafficPolicy
xlistenersets                       lset              gateway.networking.x-k8s.io/v1alpha1   true         XListenerSet
 
kubectl explain gatewayclasses.gateway.networking.k8s.io.spec
kubectl explain gateways.gateway.networking.k8s.io.spec
kubectl explain httproutes.gateway.networking.k8s.io.spec
 
 
# Installation of LBC Gateway API specific CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/refs/heads/main/config/crd/gateway/gateway-crds.yaml
 
kubectl get crd | grep gateway.k8s.aws
listenerruleconfigurations.gateway.k8s.aws            2026-03-19T06:05:41Z
loadbalancerconfigurations.gateway.k8s.aws            2026-03-19T06:05:42Z
targetgroupconfigurations.gateway.k8s.aws             2026-03-19T06:05:41Z
 
kubectl api-resources | grep gateway.k8s.aws
listenerruleconfigurations                            gateway.k8s.aws/v1beta1                true         ListenerRuleConfiguration
loadbalancerconfigurations                            gateway.k8s.aws/v1beta1                true         LoadBalancerConfiguration
targetgroupconfigurations                             gateway.k8s.aws/v1beta1                true         TargetGroupConfiguration
 
kubectl explain loadbalancerconfigurations.gateway.k8s.aws.spec
kubectl explain listenerruleconfigurations.gateway.k8s.aws.spec
kubectl explain targetgroupconfigurations.gateway.k8s.aws.spec

2) LBC에 Gateway API 기능 활성화 (Feature Gates)

기본적으로 LBC는 Gateway API CRD의 이벤트를 수신하지 않는다. 컨트롤러의 Deployment 설정에서 Feature Gates를 명시적으로 켜줘야 한다.

bash

# 설치 정보 확인
helm list -n kube-system 
helm get values -n kube-system aws-load-balancer-controller # helm values 에 Args 및 활성화 값이 현재는 없음
kubectl describe deploy -n kube-system aws-load-balancer-controller | grep Args: -A2
    Args:
      --cluster-name=myeks
      --ingress-class=alb
 
# 모니터링
kubectl get pod -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller --watch
 
# deployment 에 feature flag를 활성화 : By default, the LBC will not listen to Gateway API CRDs.
KUBE_EDITOR="nano" kubectl edit deploy -n kube-system aws-load-balancer-controller
...
      - args:
        - --cluster-name=myeks
        - --ingress-class=alb
        - --feature-gates=NLBGatewayAPI=true,ALBGatewayAPI=true
...
# 확인
kubectl describe deploy -n kube-system aws-load-balancer-controller | grep Args: -A3
    Args:
      --cluster-name=myeks
      --ingress-class=alb
      --feature-gates=NLBGatewayAPI=true,ALBGatewayAPI=true

3) External DNS 설정 업데이트

External DNS가 Ingress뿐만 아니라 새로운 HTTPRoute 같은 Gateway API 리소스도 감지하여 도메인을 생성할 수 있도록 external-dns-values.yml 를 수정한다

bash

  - gateway-httproute
  - gateway-grpcroute
  - gateway-tlsroute
  - gateway-tcproute
  - gateway-udproute

위 부분을 아래 사진의 sources: 아래에 추가한다

bash

# ExternalDNS 에 gateway api 지원 설정
helm upgrade -i external-dns external-dns/external-dns -n kube-system -f external-dns-values.yaml
 
# 확인
kubectl describe deploy -n kube-system external-dns | grep Args: -A15

리소스 만들고, 확인해 본다.

bash

cat << EOF | kubectl apply -f -
apiVersion: gateway.k8s.aws/v1beta1
kind: LoadBalancerConfiguration
metadata:
  name: lbc-config
  namespace: default
spec:
  scheme: internet-facing
EOF
 
# 확인
kubectl get loadbalancerconfiguration -owide

[인프라 관리자의 영역: GatewayClass & Gateway]

GatewayClass 생성한다. "우리 클러스터는 어떤 종류의 로드밸런서 컨트롤러를 사용할 것인가?"를 정의하는 **청사진(Blueprint)**이다.

bash

cat << EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: aws-alb
spec:
  controllerName: gateway.k8s.aws/alb
  parametersRef:
    group: gateway.k8s.aws
    kind: LoadBalancerConfiguration
    name: lbc-config
    namespace: default
EOF
 
# gatewayclasses 확인 : 약어 gc
kubectl get gatewayclasses -o wide  # k get gc

Gateway 생성한다. 위에서 만든 청사진(GatewayClass)을 바탕으로 실제 로드밸런서 인스턴스를 찍어내고, 외부 트래픽을 받을 문(Listener)을 여는 작업 이다.

bash

# gateways 생성
kubectl explain gateways.spec
cat << EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: alb-http
spec:
  gatewayClassName: aws-alb
  listeners:
  - name: http
    protocol: HTTP
    port: 80
EOF
 
# gateways 확인 : 약어 gtw
kubectl get gateways  # k get gtw
NAME       CLASS     ADDRESS                                                                     PROGRAMMED   AGE
alb-http   aws-alb   k8s-default-albhttp-8d7d6da11f-126923743.ap-northeast-2.elb.amazonaws.com   Unknown      24s
 
# ALB 생성 확인
aws elbv2 describe-load-balancers | jq 
aws elbv2 describe-target-groups
 
# 로그 모니터링
kubectl logs -l app.kubernetes.io/name=aws-load-balancer-controller -n kube-system -f
혹은
kubectl stern -l app.kubernetes.io/name=aws-load-balancer-controller -n kube-system

[애플리케이션 개발자의 영역: 앱 배포 & HTTPRoute]

bash

# 게임 파드와 Service 배포
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 2
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      containers:
      - image: public.ecr.aws/l6m2t8p7/docker-2048:latest
        imagePullPolicy: Always
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP
  selector:
    app.kubernetes.io/name: app-2048
EOF
 
# 모니터링
watch -d kubectl get pod,ingress,svc,ep,endpointslices
 
# 생성 확인
kubectl get svc,ep,pod

TargetGroupConfiguration

TargetGroupConfiguration은 AWS Load Balancer Controller(LBC)가 Gateway API 환경에서 ALB(Application Load Balancer)의 대상 그룹(Target Group)을 상세하게 제어하기 위해 만든 AWS 전용 커스텀 리소스(CRD) 입니다.

bash

# TargetGroupConfiguration 생성
kubectl explain httproutes.gateway.k8s.aws.spec
kubectl explain targetgroupconfigurations.gateway.k8s.aws.spec.defaultConfiguration 
 
cat << EOF | kubectl apply -f -
apiVersion: gateway.k8s.aws/v1beta1
kind: TargetGroupConfiguration
metadata:
  name: backend-tg-config
spec:
  targetReference:
    name: service-2048
  defaultConfiguration:
    targetType: ip
    protocol: HTTP
EOF
 
# 확인
kubectl get targetgroupconfigurations -owide
 
 
# ALB 확인
aws elbv2 describe-load-balancers | jq 
aws elbv2 describe-target-groups | jq

HTTPRoute 를 작성한다. Gateway가 열어둔 문으로 들어온 트래픽을, 도메인(gwapi.devchanki.com)이나 경로(/api)에 따라 알맞은 백엔드 서비스(파드)로 보내주는 라우팅 이정표다.

bash

# 서비스 도메인명 변수 지정
# GWMYDOMAIN=<각자 자신의 도메인명>
GWMYDOMAIN=gwapi.devchanki.com
 
# httproute 생성
kubectl explain httproutes.spec
kubectl explain httproutes.spec.parentRefs
kubectl explain httproutes.spec.hostnames
kubectl explain httproutes.spec.rules
 
cat << EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: alb-http-route
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: alb-http
    sectionName: http
  hostnames:
  - $GWMYDOMAIN
  rules:
  - backendRefs:
    - name: service-2048
      port: 80
EOF
 
# 확인
kubectl get httproute    
 
aws elbv2 describe-load-balancers | jq 
aws elbv2 describe-target-groups | jq

AWS 매핑: ALB의 리스너 규칙(Listener Rules) 으로 번역된다. 개발자는 인프라 설정(Gateway)을 건드릴 필요 없이, 자신이 만든 앱에 대한 HTTPRoute 파일만 배포하면 자동으로 ALB 라우팅 룰이 업데이트된다.

이번 HTTPRoute 실습에서도 $GWMYDOMAIN (gwapi.devchanki.com) 도메인 전파에 이슈가 있었다. 앞선 External DNS Ingress 연동 때와 마찬가지로, 도메인의 네임서버(NS)가 Cloudflare로 지정되어 있어 Route53에 A레코드가 꽂히더라도 외부에서 조회가 되지 않은 것이다.

하지만 AWS 내부 아키텍처 관점에서는 External DNS가 HTTPRoute 리소스의 hostnames을 감지하여 Route53 Hosted Zone에 레코드를 자동 생성하는 일련의 과정이 정상적으로 동작함을 확인할 수 있었다.

bash

kubectl delete httproute,targetgroupconfigurations,Gateway,GatewayClass --all

로 삭제한다.

9. 실습 종료 및 클라우드 자원 삭제

모든 실습이 끝났다. 클라우드 환경에서는 사용하지 않는 리소스를 방치하면 무서운 과금 폭탄으로 돌아오므로, 프로비저닝의 역순으로 깔끔하게 삭제하자.

bash

CLUSTER_NAME=myeks
eksctl delete iamserviceaccount --cluster=$CLUSTER_NAME --namespace=kube-system --name=external-dns
eksctl delete iamserviceaccount --cluster=$CLUSTER_NAME --namespace=kube-system --name=aws-load-balancer-controller
 
# 확인
eksctl get iamserviceaccount --cluster $CLUSTER_NAME

terraform destroy -auto-approve 으로 전체 삭제한다.

1. 실습 인프라 프로비저닝 (Terraform)

1. eks.tf

2. vars.tf

vpc.tf

4. outputs.tf

EKS 배포 이후 기본 정보 확인

2. AWS VPC CNI의 핵심 개념

AWS VPC CNI

VPC CNI 3가지 모드

2. 접두사 위임 모드 (Prefix Delegation)

3. 사용자 지정 네트워킹 (Custom Networking)

노드 기본 네트워크 정보 확인

네트워크 기본 정보

Network multitool 을 이용한 테스트

파드간 통신

파드에서 외부 통신

AWS VPC CNI 설정 변경

노드에 파드 생성 갯수 제한

최대 파드 생성 및 확인

4. 파드 배포 한계 극복: 접두사 위임(Prefix Delegation) 모드

5. Service & AWS Load Balancer Controller 연동

Ingress (L7 : HTTP)

6. External DNS 연동 (도메인 자동화)

External DNS 설치

Gateway API

1) Gateway API CRD 및 LBC 전용 CRD 설치

2) LBC에 Gateway API 기능 활성화 (Feature Gates)

3) External DNS 설정 업데이트

[인프라 관리자의 영역: GatewayClass & Gateway]

[애플리케이션 개발자의 영역: 앱 배포 & HTTPRoute]

TargetGroupConfiguration

9. 실습 종료 및 클라우드 자원 삭제

1. `eks.tf`

2. `vars.tf`

`vpc.tf`

4. `outputs.tf`