블로그 불러오는 중...
오늘은 가시다님과 함께하는 5주차 실습에 대해 정리한다. 이번 실습은 Kubespray 를 통해, 고 가용성 구성을 하고, 신규 노드를 추가 + 업그레이드를 하는 실습이다.
먼저 구성도를 올려본다.
graph TD
subgraph External_Access [External Access & LB]
LB["**admin-lb**<br/>L4 LB in front of k8s-api<br/>(192.168.10.10)"]
end
subgraph K8S_Control_Plane [Control Plane - HA<br/>node에 kcm,scheduler 생략]
subgraph CTR1_Node [k8s-node1: 192.168.10.11]
API1["**kube-apiserver**"]
KUBELET_CTR1["kubelet"]
KPROXY_CTR1["kube-proxy"]
KUBELET_CTR1 -->|local connect| API1
KPROXY_CTR1 -->|svc/ep - watch/list| API1
end
subgraph CTR2_Node [k8s-node2: 192.168.10.12]
API2["**kube-apiserver**"]
KUBELET_CTR2["kubelet"]
KPROXY_CTR2["kube-proxy"]
KUBELET_CTR2 -->|local connect| API2
KPROXY_CTR2 -->|svc/ep - watch/list| API2
end
subgraph CTR3_Node [k8s-node3: 192.168.10.13]
API3["**kube-apiserver**"]
KUBELET_CTR3["kubelet"]
KPROXY_CTR3["kube-proxy"]
KUBELET_CTR3 -->|local connect| API3
KPROXY_CTR3 -->|svc/ep - watch/list| API3
end
end
subgraph K8S_Workers [Worker Nodes]
subgraph W1_Node [k8s-node4: 192.168.10.14]
KUBELET1["kubelet"]
KPROXY1["kube-proxy"]
NGX1["**nginx static pod<br/>(Client-Side LB)**"]
KUBELET1 -->|node mgmt| NGX1
KPROXY1 -->|local connect| NGX1
end
subgraph W2_Node [k8s-node5: 192.168.10.15]
KUBELET2["kubelet"]
KPROXY2["kube-proxy"]
NGX2["**nginx static pod<br/>(Client-Side LB)**"]
KUBELET2 -->|node mgmt| NGX2
KPROXY2 -->|local connect| NGX2
end
end
%% Connections
%% 1. LB에서 각 Control Plane의 API Server로 로드밸런싱
LB ==>|6443/TCP| API1
LB ==>|6443/TCP| API2
LB ==>|6443/TCP| API3
%% 2. Worker Node(kubelet/kube-proxy)에서 API Server로 연결 (일반적으로 LB 경유)
W1_Node -.->|Watch/Update| API1
W1_Node -.->|Watch/Update| API2
W1_Node -.->|Watch/Update| API3
W2_Node -.->|Watch/Update| API1
W2_Node -.->|Watch/Update| API2
W2_Node -.->|Watch/Update| API3
%% Styling
style LB fill:#f9f,stroke:#333,stroke-width:2px
style API1 fill:#fff9c4,stroke:#fbc02d
style API2 fill:#fff9c4,stroke:#fbc02d
style API3 fill:#fff9c4,stroke:#fbc02d
style KUBELET_CTR1 fill:#e3f2fd,stroke:#1e88e5
style KUBELET_CTR2 fill:#e3f2fd,stroke:#1e88e5
style KUBELET_CTR3 fill:#e3f2fd,stroke:#1e88e5
style KPROXY_CTR1 fill:#ede7f6,stroke:#5e35b1
style KPROXY_CTR2 fill:#ede7f6,stroke:#5e35b1
style KPROXY_CTR3 fill:#ede7f6,stroke:#5e35b1
style KUBELET1 fill:#e3f2fd,stroke:#1e88e5
style KUBELET2 fill:#e3f2fd,stroke:#1e88e5
style KPROXY1 fill:#ede7f6,stroke:#5e35b1
style KPROXY2 fill:#ede7f6,stroke:#5e35b1
style NGX1 fill:#c8e6c9,stroke:#388e3c
style NGX2 fill:#c8e6c9,stroke:#388e3c
style K8S_Control_Plane fill:#e1f5fe,stroke:#01579b
style K8S_Workers fill:#f1f8e9,stroke:#33691e
해당 실습을 하면서, 내 윈도우 machine 에서는 아무리 설치를 여러번 해도 잘 되지 않았다. 올라오면서 한개 서버씩 꼭 문제를 일으켰다. 데스크탑 스팩은 램 32G 에 13세대 CPU 로 부족한 것 같지는 않았지만, 이 실습을 위해 아는 분의 맥을 대여해서 실습했다.
맥에서는 한번에 성공 하였다.
# 실습용 디렉터리 생성
mkdir k8s-ha-kubespary
cd k8s-ha-kubespary
# 파일 다운로드
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/Vagrantfile
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/admin-lb.sh
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/init_cfg.sh
# 실습 환경 배포
vagrant up
vagrant status# 관리 대상 노드 통신 확인
cat /etc/hosts
for i in {0..5}; do echo ">> k8s-node$i <<"; ssh 192.168.10.1$i hostname; echo; done
for i in {1..5}; do echo ">> k8s-node$i <<"; ssh k8s-node$i hostname; echo; done
# 파이썬 버전 정보 확인
python -V && pip -V
# kubespary 작업 디렉터리 및 파일 확인
tree /root/kubespray/ -L 2
cd /root/kubespray/
cat ansible.cfg
cat /root/kubespray/inventory/mycluster/inventory.ini
# NFS Server 정보 확인
systemctl status nfs-server --no-pager
tree /srv/nfs/share/
exportfs -rav
exporting *:/srv/nfs/share
cat /etc/exports
/srv/nfs/share *(rw,async,no_root_squash,no_subtree_check)
# admin-lb IP에 TCP 6443 호출(인입)시, 백엔드 대상인 k8s-node1~3에 각각 분산 전달 설정 확인
cat /etc/haproxy/haproxy.cfg
...
# ---------------------------------------------------------------------
# Kubernetes API Server Load Balancer Configuration
# ---------------------------------------------------------------------
frontend k8s-api
bind *:6443
mode tcp
option tcplog
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
option tcp-check
option log-health-checks
timeout client 3h
timeout server 3h
balance roundrobin
server k8s-node1 192.168.10.11:6443 check check-ssl verify none inter 10000
server k8s-node2 192.168.10.12:6443 check check-ssl verify none inter 10000
server k8s-node3 192.168.10.13:6443 check check-ssl verify none inter 10000
# HAProxy 상태 확인
systemctl status haproxy.service --no-pager
journalctl -u haproxy.service --no-pager
ss -tnlp | grep haproxy
LISTEN 0 3000 0.0.0.0:6443 0.0.0.0:* users:(("haproxy",pid=4915,fd=7)) # k8s api loadbalancer
LISTEN 0 3000 0.0.0.0:9000 0.0.0.0:* users:(("haproxy",pid=4915,fd=8)) # haproxy stat dashbaord
LISTEN 0 3000 0.0.0.0:8405 0.0.0.0:* users:(("haproxy",pid=4915,fd=9)) # metrics exporter
# 통계 페이지 접속
open http://192.168.10.10:9000/haproxy_stats
# (참고) 프로테우스 메트릭 엔드포인트 접속
curl http://192.168.10.10:8405/metrics아래 사진을 보면 haproxy 상태와 프로메테우스 메트릭 확인을 해 본다.


# 작업용 inventory 디렉터리 확인
cd /root/kubespray/
git describe --tags
git --no-pager tag
...
v2.29.1
v2.3.0
v2.30.0
...
tree inventory/mycluster/
...
# inventory.ini 확인
cat /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
#k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
# 아래 hostvars 에 선언 적용된 값 찾기는 아래 코드 블록 참고
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --list
"hostvars": {
"k8s-node1": {
"allow_unsupported_distribution_setup": false,
"ansible_host": "192.168.10.11", # 해당 값은 바로 위 인벤토리에 host에 직접 선언
"bin_dir": "/usr/local/bin",
...
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
@all:
|--@ungrouped:
|--@etcd:
| |--@kube_control_plane:
| | |--k8s-node1
| | |--k8s-node2
| | |--k8s-node3
|--@kube_node:
| |--k8s-node4
# k8s_cluster.yml # for every node in the cluster (not etcd when it's separate)
sed -i 's|kube_owner: kube|kube_owner: root|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_network_plugin: calico|kube_network_plugin: flannel|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_proxy_mode: ipvs|kube_proxy_mode: iptables|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|enable_nodelocaldns: true|enable_nodelocaldns: false|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
grep -iE 'kube_owner|kube_network_plugin:|kube_proxy_mode|enable_nodelocaldns:' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
## coredns autoscaler 미설치
echo "enable_dns_autoscaler: false" >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# flannel 설정 수정
echo "flannel_interface: enp0s9" >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# addons
sed -i 's|metrics_server_enabled: false|metrics_server_enabled: true|g' inventory/mycluster/group_vars/k8s_cluster/addons.yml
grep -iE 'metrics_server_enabled:' inventory/mycluster/group_vars/k8s_cluster/addons.yml
## cat roles/kubernetes-apps/metrics_server/defaults/main.yml # 메트릭서버 관련 디폴트 변수 참고
## cat roles/kubernetes-apps/metrics_server/templates/metrics-server-deployment.yaml.j2 # jinja2 템플릿 파일 참고
echo "metrics_server_requests_cpu: 25m" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
echo "metrics_server_requests_memory: 16Mi" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 지원 버전 정보 확인
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
# 배포: 아래처럼 반드시 ~/kubespray 디렉토리에서 ansible-playbook 를 실행하자! 8분 정도 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --list-tasks # 배포 전, Task 목록 확인
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml -e kube_version="1.32.9" | tee kubespray_install.log
# 설치 확인
more kubespray_install.log
# facts 수집 정보 확인
tree /tmp
├── k8s-node1
├── k8s-node2
├── k8s-node3
...
# local_release_dir: "/tmp/releases" 확인
ssh k8s-node1 tree /tmp/releases
ssh k8s-node4 tree /tmp/releases
# sysctl 적용값 확인
ssh k8s-node1 grep "^[^#]" /etc/sysctl.conf
ssh k8s-node4 grep "^[^#]" /etc/sysctl.conf
# etcd 백업 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done
# k8s api 호출 확인 : IP, Domain
cat /etc/hosts
for i in {1..3}; do echo ">> k8s-node$i <<"; curl -sk https://192.168.10.1$i:6443/version | grep Version; echo; done
for i in {1..3}; do echo ">> k8s-node$i <<"; curl -sk https://k8s-node$i:6443/version | grep Version; echo; done
# k8s admin 자격증명 확인 : 컨트롤 플레인 노드들은 apiserver 파드가 배치되어 있으니 127.0.0.1:6443 엔드포인트 설정됨
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i kubectl cluster-info -v=6; echo; done
I0127 17:06:29.481012 27149 loader.go:402] Config loaded from file: /root/.kube/config
...
Kubernetes control plane is running at https://127.0.0.1:6443
mkdir /root/.kube
scp k8s-node1:/root/.kube/config /root/.kube/
cat /root/.kube/config | grep server
# API Server 주소를 localhost에서 컨트롤 플레인 1번 node P로 변경 : 1번 node 장애 시, 직접 수동으로 다른 node IP 변경 필요.
kubectl get node -owide -v=6
sed -i 's/127.0.0.1/192.168.10.11/g' /root/.kube/config
혹은
sed -i 's/127.0.0.1/192.168.10.12/g' /root/.kube/config
sed -i 's/127.0.0.1/192.168.10.13/g' /root/.kube/config
kubectl get node -owide -v=6
I0127 17:08:03.347290 14006 loader.go:402] Config loaded from file: /root/.kube/config
...
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 3m37s v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 3m31s v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 3m29s v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 3m3s v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# [kube_control_plane] 과 [kube_node] 포함 노드 비교
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
kubectl describe node | grep -E 'Name:|Taints'
Name: k8s-node1
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: k8s-node2
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: k8s-node3
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: k8s-node4
Taints: <none>
kubectl get pod -A
...
# 노드별 파드 CIDR 확인
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
k8s-node1 10.233.64.0/24
k8s-node2 10.233.65.0/24
k8s-node3 10.233.66.0/24
k8s-node4 10.233.67.0/24
# etcd 정보 확인 : etcd name 확인
ssh k8s-node1 etcdctl.sh member list -w table
+------------------+---------+-------+----------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
| 8b0ca30665374b0 | started | etcd3 | https://192.168.10.13:2380 | https://192.168.10.13:2379 | false |
| 2106626b12a4099f | started | etcd2 | https://192.168.10.12:2380 | https://192.168.10.12:2379 | false |
| c6702130d82d740f | started | etcd1 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false |
+------------------+---------+-------+----------------------------+----------------------------+------------+
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
>> k8s-node1 <<
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 8.3 MB | true | false | 4 | 2834 | 2834 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
...
# k9s 실행
k9s
# 자동완성 및 단축키 설정
source <(kubectl completion bash)
alias k=kubectl
alias kc=kubecolor
complete -F __start_kubectl k
echo 'source <(kubectl completion bash)' >> /etc/profile
echo 'alias k=kubectl' >> /etc/profile
echo 'alias kc=kubecolor' >> /etc/profile
echo 'complete -F __start_kubectl k' >> /etc/profile
# worker(kubeclt, kube-proxy) -> k8s api
# 워커노드에서 정보 확인
ssh k8s-node4 crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
3c09f930b22b0 5a91d90f47ddf 15 minutes ago Running nginx-proxy 0 81b36842732ba nginx-proxy-k8s-node4 kube-system
...
ssh k8s-node4 cat /etc/nginx/nginx.conf
error_log stderr notice;
worker_processes 2;
worker_rlimit_nofile 130048;
worker_shutdown_timeout 10s;
...
stream {
upstream kube_apiserver {
least_conn;
server 192.168.10.11:6443;
server 192.168.10.12:6443;
server 192.168.10.13:6443;
}
server {
listen 127.0.0.1:6443;
proxy_pass kube_apiserver;
proxy_timeout 10m;
proxy_connect_timeout 1s;
}
}
http {
...
server {
listen 8081;
location /healthz {
access_log off;
return 200;
...
ssh k8s-node4 curl -s localhost:8081/healthz -I
HTTP/1.1 200 OK
Server: nginx
# 워커노드에서 -> Client-Side LB를 사용해서 k8s api 호출 시도
ssh k8s-node4 curl -sk https://127.0.0.1:6443/version | grep Version
"gitVersion": "v1.32.9",
"goVersion": "go1.23.12",
ssh k8s-node4 ss -tnlp | grep nginx
LISTEN 0 511 0.0.0.0:8081 0.0.0.0:* users:(("nginx",pid=15043,fd=6),("nginx",pid=15042,fd=6),("nginx",pid=15016,fd=6))
LISTEN 0 511 127.0.0.1:6443 0.0.0.0:* users:(("nginx",pid=15043,fd=5),("nginx",pid=15042,fd=5),("nginx",pid=15016,fd=5))
# kubelet(client) -> api-server 호출 시 엔드포인트 정보 확인 : https://localhost:6443
ssh k8s-node4 cat /etc/kubernetes/kubelet.conf
ssh k8s-node4 cat /etc/kubernetes/kubelet.conf | grep server
server: https://localhost:6443
# kube-proxy(client) -> api-server 호출 시 엔드포인트 정보 확인
kc get cm -n kube-system kube-proxy -o yaml
kubectl get cm -n kube-system kube-proxy -o yaml | grep 'kubeconfig.conf:' -A18
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://127.0.0.1:6443
name: default
...
# nginx.conf 생성 Task
tree roles/kubernetes/node/tasks/loadbalancer
cat roles/kubernetes/node/tasks/loadbalancer/nginx-proxy.yml
...
- name: Nginx-proxy | Write nginx-proxy configuration
template:
src: "loadbalancer/nginx.conf.j2"
dest: "{{ nginx_config_dir }}/nginx.conf"
owner: root
mode: "0755"
backup: true
# nginx.conf jinja2 템플릿 파일
cat roles/kubernetes/node/templates/loadbalancer/nginx.conf.j2
error_log stderr notice;
worker_processes 2;
worker_rlimit_nofile 130048;
worker_shutdown_timeout 10s;
events {
multi_accept on;
use epoll;
worker_connections 16384;
}
stream {
upstream kube_apiserver {
least_conn; # RoundRobin 이 아닌 Lease_conn이 기본 설정인 이유는?
{% for host in groups['kube_control_plane'] -%}
server {{ hostvars[host]['main_access_ip'] | ansible.utils.ipwrap }}:{{ kube_apiserver_port }};
{% endfor -%}
}
server {
listen 127.0.0.1:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }};
{% if ipv6_stack -%}
listen [::1]:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }};
{% endif -%}
proxy_pass kube_apiserver;
proxy_timeout 10m;
proxy_connect_timeout 1s;
}
}
# nginx static pod 매니페스트 파일 확인
cat roles/kubernetes/node/templates/manifests/nginx-proxy.manifest.j2
apiVersion: v1
kind: Pod
metadata:
name: {{ loadbalancer_apiserver_pod_name }}
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-nginx
annotations:
nginx-cfg-checksum: "{{ nginx_stat.stat.checksum }}"
...nginx 설정을 확인 해 본다. 노드 안에 있는 설정을 보면 kube-apiserver 가 로드밸런싱이 된 것을 확인할 수 있다.

tree playbooks/
grep -Rni "tags" playbooks -A2 -B1
# roles/ 파일 중 tags
tree roles/ -L 2
grep -Rni "tags" roles --include="*.yml" -A2 -B1
grep -Rni "tags" roles --include="*.yml" -A3 | less
이런 식으로 태그가 뭐가 달려 있는지 볼 수 있다.
# apiserver static 파드의 bind-address 에 '::' 확인
kubectl describe pod -n kube-system kube-apiserver-k8s-node1 | grep -E 'address|secure-port'
Annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.10.11:6443
--advertise-address=192.168.10.11
--secure-port=6443
--bind-address=::
ssh k8s-node1 ss -tnlp | grep 6443
LISTEN 0 4096 *:6443 *:* users:(("kube-apiserver",pid=26124,fd=3))
ssh k8s-node1 ip -br -4 addr
ssh k8s-node1 curl -sk https://127.0.0.1:6443/version | grep gitVersion
ssh k8s-node1 curl -sk https://192.168.10.11:6443/version | grep gitVersion
ssh k8s-node1 curl -sk https://10.0.2.15:6443/version | grep gitVersion
"gitVersion": "v1.32.9",
# admin 자격증명(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/admin.conf | grep server
server: https://127.0.0.1:6443
# super-admin 자격증명(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/super-admin.conf | grep server
server: https://192.168.10.11:6443
# kubelet(client) -> api-server 호출 시 엔드포인트 정보 확인 : https://127.0.0.1:6443
ssh k8s-node1 cat /etc/kubernetes/kubelet.conf
ssh k8s-node1 cat /etc/kubernetes/kubelet.conf | grep server
server: https://127.0.0.1:6443
# kube-proxy(client) -> api-server 호출 시 엔드포인트 정보 확인
kc get cm -n kube-system kube-proxy -o yaml
k get cm -n kube-system kube-proxy -o yaml | grep server
server: https://127.0.0.1:6443
# kube-controller-manager(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/controller-manager.conf | grep server
server: https://127.0.0.1:6443
# kube-scheduler(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/scheduler.conf | grep server
server: https://127.0.0.1:6443k8s api endpoint 호출할 때 127.0.0.1 로 호출해도 로드밸런싱이 되면서 호출이 가능하다.

# kube-ops-view
## helm show values geek-cookbook/kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
# macOS 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system \
--set image.repository="abihf/kube-ops-view" --set image.tag="latest"
# Windows 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system
# 설치 확인
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view
# kube-ops-view 접속 URL 확인 (1.5 , 2 배율) : nodePor 이므로 IP는 all node 의 IP 가능!
open "http://192.168.10.14:30000/#scale=1.5"
open "http://192.168.10.14:30000/#scale=2"kube-ops-view 는 노드 상태를 아래와 같이 보여준다.

# 샘플 애플리케이션 배포
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: webpod
spec:
replicas: 2
selector:
matchLabels:
app: webpod
template:
metadata:
labels:
app: webpod
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sample-app
topologyKey: "kubernetes.io/hostname"
containers:
- name: webpod
image: traefik/whoami
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: webpod
labels:
app: webpod
spec:
selector:
app: webpod
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 30003
type: NodePort
EOF# 배포 확인
kubectl get deploy,svc,ep webpod -owide
[admin-lb] # IP는 node 작업에 따라 변경
while true; do curl -s http://192.168.10.14:30003 | grep Hostname; sleep 1; done
# (옵션) k8s-node 에서 service 명 호출 확인
ssh k8s-node1 cat /etc/resolv.conf
# Generated by NetworkManager
search default.svc.cluster.local svc.cluster.local
nameserver 10.233.0.3
nameserver 168.126.63.1
nameserver 8.8.8.8
options ndots:2 timeout:2 attempts:2
# 성공
ssh k8s-node1 curl -s webpod -I
HTTP/1.1 200 OK
# 성공
ssh k8s-node1 curl -s webpod.default -I
HTTP/1.1 200 OK
# 실패
ssh k8s-node1 curl -s webpod.default.svc -I
ssh k8s-node1 curl -s webpod.default.svc.cluster -I
# 성공
ssh k8s-node1 curl -s webpod.default.svc.cluster.local -I
HTTP/1.1 200 OK통신 되는지 확인 하고, 쉘에서 무한으로 호출을 한다.
# [admin-lb] kubeconfig 자격증명 사용 시 정보 확인
cat /root/.kube/config | grep server
server: https://192.168.10.11:6443
# 모니터링 : 신규 터미널 4개
# ----------------------
## [admin-lb]
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
## [k8s-node2]
watch -d kubectl get pod -n kube-system
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
## [k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
# ----------------------
# 장애 재현
[k8s-node1] poweroff
# [k8s-node2]
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
2026/01/28 12:47:08 [error] 20#20: *3145 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:6443, upstream: "192.168.10.11:6443", bytes from/to client:0/0, bytes from/to upstream:0/0
2026/01/28 12:47:08 [warn] 20#20: *3145 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:6443, upstream: "192.168.10.11:6443", bytes from/to client:0/0, bytes from/to upstream:0/0
# [k8s-node4] 하지만 백엔드 대상 서버가 나머지 2대가 있으니 아래 요청 처리 정상!
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
"gitVersion": "v1.32.9",
# [admin-lb] 아래 자격증명 서버 정보 수정 필요
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
Unable to connect to the server: dial tcp 192.168.10.11:6443: connect: no route to host # << 요건 실패!
"gitVersion": "v1.32.9", # << 요건 성공!
sed -i 's/192.168.10.11/192.168.10.12/g' /root/.kube/config
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
NAME STATUS ROLES AGE VERSION
k8s-node1 NotReady control-plane 4h35m v1.32.9
k8s-node2 Ready control-plane 4h35m v1.32.9
k8s-node3 Ready control-plane 4h35m v1.32.9
k8s-node4 Ready <none> 4h34m v1.32.9
"gitVersion": "v1.32.9",poweroff 를 한다.
다음과 같이 종료된 상태를 볼 수 있다.
HaProxy 에서도 비정상 적인 것을 확인할 수 있다.

#
curl -sk https://192.168.10.10:6443/version | grep gitVersion
"gitVersion": "v1.32.9",
#
sed -i 's/192.168.10.12/192.168.10.10/g' /root/.kube/config
# 인증서 SAN list 확인
kubectl get node
E0128 23:53:41.079370 70802 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.10.10:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 192.168.10.11, 127.0.0.1, ::1, 192.168.10.12, 192.168.10.13, 10.0.2.15, fd17:625c:f037:2:a00:27ff:fe90:eaeb, not 192.168.10.10"
# 인증서에 SAN 정보 확인
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
...
ssh k8s-node1 kubectl get cm -n kube-system kubeadm-config -o yaml
apiServer:
certSANs:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
- 10.233.0.1
- localhost
- 127.0.0.1
- ::1
- k8s-node1
- k8s-node2
- k8s-node3
- lb-apiserver.kubernetes.local
- 192.168.10.11
- 192.168.10.12
- 192.168.10.13
- 10.0.2.15
- fd17:625c:f037:2:a00:27ff:fe90:eaeb
# 인증서 SAN 에 'IP, Domain' 추가
echo "supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]" >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "kube-apiserver" --list-tasks
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "kubeadm" --list-tasks
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "facts" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --list-tasks
...
play #10 (kube_control_plane): Install the control plane TAGS: []
tasks:
...
kubernetes/control-plane : Kubeadm | aggregate all SANs TAGS: [control-plane, facts]
...
# (신규터미널) 모니터링
[k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date ; sleep 1; echo ; done
# 1분 이내 완료
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --limit kube_control_plane -e kube_version="1.32.9"
Gather minimal facts ------------------------------------------------------------------------------------------------------- 2.00s
kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN hosts --------------------------------------------------------- 1.57s
kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN IPs ----------------------------------------------------------- 1.33s
kubernetes/control-plane : Backup old certs and keys ----------------------------------------------------------------------- 1.26s
Gather necessary facts (hardware) ------------------------------------------------------------------------------------------ 0.98s
kubernetes/control-plane : Install | Copy kubectl binary from download dir ------------------------------------------------- 0.95s
kubernetes/preinstall : Create other directories of root owner ------------------------------------------------------------- 0.92s
win_nodes/kubernetes_patch : debug ----------------------------------------------------------------------------------------- 0.84s
kubernetes/control-plane : Backup old confs -------------------------------------------------------------------------------- 0.83s
kubernetes/control-plane : Update server field in component kubeconfigs ---------------------------------------------------- 0.78s
kubernetes/control-plane : Kubeadm | Create kubeadm config ----------------------------------------------------------------- 0.76s
kubernetes/preinstall : Create kubernetes directories ---------------------------------------------------------------------- 0.67s
kubernetes/control-plane : Kubeadm | regenerate apiserver cert 2/2 --------------------------------------------------------- 0.50s
kubernetes/control-plane : Renew K8S control plane certificates monthly 2/2 ------------------------------------------------ 0.46s
kubernetes/control-plane : Create kube-scheduler config -------------------------------------------------------------------- 0.41s
Gather necessary facts (network) ------------------------------------------------------------------------------------------- 0.38s
kubernetes/control-plane : Install script to renew K8S control plane certificates ------------------------------------------ 0.37s
kubernetes/control-plane : Kubeadm | regenerate apiserver cert 1/2 --------------------------------------------------------- 0.34s
kubernetes/control-plane : Kubeadm | aggregate all SANs -------------------------------------------------------------------- 0.29s
kubernetes/control-plane : Check which kube-control nodes are already members of the cluster ------------------------------- 0.28s
# 192.168.10.10 엔드포인트 요청 성공!
kubectl get node -v=6
...
I0129 00:17:13.825729 81610 round_trippers.go:560] GET https://192.168.10.10:6443/api/v1/nodes?limit=500 200 OK in 8 milliseconds
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 7h v1.32.9
k8s-node2 Ready control-plane 7h v1.32.9
k8s-node3 Ready control-plane 7h v1.32.9
k8s-node4 Ready <none> 6h59m v1.32.9
# ip, domain 둘 다 확인
sed -i 's/192.168.10.10/k8s-api-srv.admin-lb.com/g' /root/.kube/config
# 추가 확인
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
X509v3 Subject Alternative Name:
DNS:k8s-api-srv.admin-lb.com, DNS:k8s-node1, DNS:k8s-node2, DNS:k8s-node3, DNS:kubernetes, DNS:kubernetes.default,
DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-apiserver.kubernetes.local,
DNS:localhost, IP Address:10.233.0.1, IP Address:192.168.10.11, IP Address:127.0.0.1,
IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.10.10, IP Address:192.168.10.12, IP Address:192.168.10.13,
IP Address:10.0.2.15, IP Address:FD17:625C:F037:2:A00:27FF:FE90:EAEB
# 해당 cm은 최초 설치 후 자동 업데이트 X, 업그레이드에 활용된다고 하니, 위 처럼 kubeadm config 변경 시 직접 cm도 같이 변경해두자.
kubectl get cm -n kube-system kubeadm-config -o yaml
...
kubectl edit cm -n kube-system kubeadm-config # or k9s -> cm kube-system
...해당 부분은 실습을 여러번 진행해서 그런지 이미 다 되어 있었다. 이 부분의 경우엔 ip 가 자격증명에 없어 추가하는 과정이다.
아래 키워드로 에러가 발생했다. 아래 gemini 응답으로, 원인을 파악할 수 있었다.
이 오류는 --limit=k8s-node5 옵션을 너무 제한적으로 사용했기 때문에 발생했습니다.
원인 분석
오류가 발생한 Nginx-proxy 태스크는 워커 노드(k8s-node5)가 쿠버네티스 API 서버(마스터 노드)와 통신하기 위해 로컬 로드밸런서를 설정하는 단계입니다. 이 과정에서 마스터 노드들의 IP 주소 목록이 필요한데, limit으로 마스터 노드들을 실행 대상에서 제외해버렸기 때문에 해당 변수(AnsibleUndefined)를 찾을 수 없어 실패한 것입니다.
해결 방법
--limit 옵션에 Control Plane(마스터 노드)과 etcd 그룹을 추가하여 실행해야 합니다. 그래야 Ansible이 마스터 노드의 정보를 읽어와서 새 워커 노드에 설정을 전달할 수 있습니다.
아래 명령어로 다시 실행해 보세요.
code
Bash
ANSIBLE_FORCE_COLOR=true ansible-playbook \
-i inventory/mycluster/inventory.ini \
-v scale.yml \
--limit="k8s-node5,kube_control_plane,etcd" \
-e kube_version="1.32.9" \
| tee kubespray_add_worker_node.log
(참고) 만약 쉘(Shell) 종류에 따라 쉼표 처리가 안 될 수 있으니, 위 예시처럼 쌍따옴표(")로 묶어주는 것이 안전합니다.
요약
문제: limit으로 마스터 노드를 제외하여, 워커 노드가 API 서버 주소를 알 수 없음.
해결: limit에 kube_control_plane과 etcd 그룹을 포함시킴.# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
@all:
|--@ungrouped:
|--@etcd:
| |--@kube_control_plane:
| | |--k8s-node1
| | |--k8s-node2
| | |--k8s-node3
|--@kube_node:
| |--k8s-node4
| |--k8s-node5
# ansible 연결 확인
ansible -i inventory/mycluster/inventory.ini k8s-node5 -m ping
# 모니터링
watch -d kubectl get node
kube-ops-view
# 워커 노드 추가 수행 : 3분 정도 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --list-tasks
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --limit=k8s-node5 -e kube_version="1.32.9" | tee kubespray_add_worker_node.log
# 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 48m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 48m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 48m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 47m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 66s v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
kubectl get pod -n kube-system -owide |grep k8s-node5
kube-flannel-ds-arm64-2djxl 1/1 Running 1 (80s ago) 114s 192.168.10.15 k8s-node5 <none> <none>
kube-proxy-x6cmm 1/1 Running 0 114s 192.168.10.15 k8s-node5 <none> <none>
nginx-proxy-k8s-node5 1/1 Running 0 113s 192.168.10.15 k8s-node5 <none> <none>
# 변경 정보 확인
ssh k8s-node5 tree /etc/kubernetes
ssh k8s-node5 tree /var/lib/kubelet
ssh k8s-node5 pstree -a
# 샘플 파드 분배
kubectl get pod -owide
kubectl scale deployment webpod --replicas 1
kubectl get pod -owide
kubectl scale deployment webpod --replicas 2
위 명령어로 k8s-node5 를 추가한다. 약간의 시간이 소요된다.
#
cat scale.yml
---
- name: Scale the cluster
ansible.builtin.import_playbook: playbooks/scale.yml
cat playbooks/scale.yml
---
- name: Common tasks for every playbooks
import_playbook: boilerplate.yml
- name: Gather facts
import_playbook: internal_facts.yml
- name: Install etcd # 기존 etcd 클러스터는 변경하지 않음, 새 노드가 etcd 멤버일 경우에만 join
vars:
etcd_cluster_setup: false
etcd_events_cluster_setup: false
import_playbook: install_etcd.yml
- name: Download images to ansible host cache via first kube_control_plane node # download_run_once 설정이 되어 있다면, 첫 번째 control-plane 노드에서만 실행 : 이미지/바이너리 캐시를 ansible host에 적재
hosts: kube_control_plane[0]
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults, when: "not skip_downloads and download_run_once and not download_localhost" }
- { role: kubernetes/preinstall, tags: preinstall, when: "not skip_downloads and download_run_once and not download_localhost" }
- { role: download, tags: download, when: "not skip_downloads and download_run_once and not download_localhost" }
- name: Target only workers to get kubelet installed and checking in on any new nodes(engine) # 워커 노드 준비
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/preinstall, tags: preinstall }
- { role: container-engine, tags: "container-engine", when: deploy_container_engine }
- { role: download, tags: download, when: "not skip_downloads" }
- role: etcd # (조건부): Calico 같은 네트워크 플러그인이 etcd를 직접 사용하는 경우, 워커 노드에서도 접속 가능하도록 설정합니다.
tags: etcd
vars:
etcd_cluster_setup: false
when:
- etcd_deployment_type != "kubeadm"
- kube_network_plugin in ["calico", "flannel", "canal", "cilium"] or cilium_deploy_additionally | default(false) | bool
- kube_network_plugin != "calico" or calico_datastore == "etcd"
- name: Target only workers to get kubelet installed and checking in on any new nodes(node) # kubelet 설치
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/node, tags: node } # kubelet 설치, systemd 등록, 아직 클러스터 join X
- name: Upload control plane certs and retrieve encryption key # kubeadm 인증서 공유
## 새 노드가 클러스터에 안전하게 조인할 수 있도록 kubeadm을 통해 인증서를 업로드하고, 조인에 필요한 certificate_key를 추출하여 변수로 저장합니다.
hosts: kube_control_plane | first # 대상: 첫 번째 마스터 노드
environment: "{{ proxy_disable_env }}"
gather_facts: false
tags: kubeadm
roles:
- { role: kubespray_defaults }
tasks:
- name: Upload control plane certificates
command: >-
{{ bin_dir }}/kubeadm init phase # kubeadm init phase upload-certs --upload-certs
--config {{ kube_config_dir }}/kubeadm-config.yaml
upload-certs
--upload-certs
environment: "{{ proxy_disable_env }}"
register: kubeadm_upload_cert
changed_when: false
- name: Set fact 'kubeadm_certificate_key' for later use
set_fact:
kubeadm_certificate_key: "{{ kubeadm_upload_cert.stdout_lines[-1] | trim }}"
when: kubeadm_certificate_key is not defined
- name: Target only workers to get kubelet installed and checking in on any new nodes(network) # 클러스터 조인 및 네트워크 설정
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/kubeadm, tags: kubeadm } # 새 워커 노드에서 kubeadm join 명령을 실행하여 클러스터에 공식적으로 등록합니다.
- { role: kubernetes/node-label, tags: node-label } # 노드에 지정된 라벨(Label)과 테인트(Taint)를 적용합니다.
- { role: kubernetes/node-taint, tags: node-taint } # 상동
- { role: network_plugin, tags: network } # CNI(Calico, Flannel 등) 설정을 적용하여 노드 간 통신이 가능하게 합니다.
- name: Apply resolv.conf changes now that cluster DNS is up # DNS 설정
hosts: k8s_cluster
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/preinstall, when: "dns_mode != 'none' and resolvconf_mode == 'host_resolvconf'", tags: resolvconf, dns_late: true }
# resolvconf: 클러스터 내부 DNS(CoreDNS 등)가 활성화되었으므로, 각 노드의 /etc/resolv.conf를 업데이트하여 노드들이 내부 도메인을 해석할 수 있도록 수정합니다.playbooks 폴더 아래에 scale.yml 을 확인해 본다.
# webpod deployment 에 pdb 설정 : 해당 정책은 항상 최소 2개의 Pod가 Ready 상태여야 함 , drain / eviction 시 단 하나의 Pod도 축출 불가
kubectl scale deployment webpod --replicas 1
kubectl scale deployment webpod --replicas 2
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: webpod
namespace: default
spec:
maxUnavailable: 0
selector:
matchLabels:
app: webpod
EOF
# 확인
kubectl get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
webpod N/A 0 0 6s
# 삭제 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml --list-tags
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
...
PLAY [Confirm node removal] *******************************************************************************************************
Thursday 29 January 2026 14:10:10 +0900 (0:00:00.106) 0:00:01.562 ******
[Confirm Execution]
Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
...
TASK [remove_node/pre_remove : Remove-node | List nodes] **************************************************************************
ok: [k8s-node5 -> k8s-node1(192.168.10.11)] => {"changed": false, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "get", "nodes", "-o", "go-template={{ range .items }}{{ .metadata.name }}{{ \"\\n\" }}{{ end }}"], "delta": "0:00:00.159970", "end": "2026-01-31 15:02:13.863633", "msg": "", "rc": 0, "start": "2026-01-31 15:02:13.703663", "stderr": "", "stderr_lines": [], "stdout": "k8s-node1\nk8s-node2\nk8s-node3\nk8s-node4\nk8s-node5", "stdout_lines": ["k8s-node1", "k8s-node2", "k8s-node3", "k8s-node4", "k8s-node5"]}
Saturday 31 January 2026 15:02:13 +0900 (0:00:00.552) 0:00:22.561 ******
FAILED - RETRYING: [k8s-node5 -> k8s-node1]: Remove-node | Drain node except daemonsets resource (3 retries left).
CTRL+C 취소
# pdb 삭제
kubectl delete pdb webpod
# 다시 삭제 시도 : 2분 20초 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
...
PLAY [Confirm node removal] *******************************************************************************************************
Thursday 29 January 2026 14:10:10 +0900 (0:00:00.106) 0:00:01.562 ******
[Confirm Execution]
Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
...
# 확인
kubectl get node -owide
# 삭제 확인
ssh k8s-node5 tree /etc/kubernetes
ssh k8s-node5 tree /var/lib/kubelet
ssh k8s-node5 pstree -a
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
EOF
중간에 삭제가 실패하는 이유는 pdb 가 걸려 있다. 그래서 실패한다. 제거 하고, 다시 실행하면 삭제가 된다.
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
# 워커 노드 추가 수행 : 3분 정도 소요
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --limit=k8s-node5 -e kube_version="1.32.9" | tee kubespray_add_worker_node.log
# 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 48m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 48m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 48m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 47m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 66s v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 샘플 파드 분배
kubectl get pod -owide
kubectl scale deployment webpod --replicas 1
kubectl get pod -owide
kubectl scale deployment webpod --replicas 2실습 목표 : 1.32.9 → 1.32.10 (패치 업그레이드) → 1.33.7 (마이너 업그레이드) → 1.34.3 : 최소 중단(무중단) 업그레이드 수행
# 관련 변수 검색
grep -Rni "flannel" inventory/mycluster/ playbooks/ roles/ --include="*.yml" -A2 -B1
...
roles/kubespray_defaults/defaults/main/download.yml:115:flannel_version: 0.27.3
roles/kubespray_defaults/defaults/main/download.yml:116:flannel_cni_version: 1.7.1-flannel1
roles/kubespray_defaults/defaults/main/download.yml:219:flannel_image_repo: "{{ docker_image_repo }}/flannel/flannel"
roles/kubespray_defaults/defaults/main/download.yml:220:flannel_image_tag: "v{{ flannel_version }}"
roles/kubespray_defaults/defaults/main/download.yml:221:flannel_init_image_repo: "{{ docker_image_repo }}/flannel/flannel-cni-plugin"
roles/kubespray_defaults/defaults/main/download.yml:222:flannel_init_image_tag: "v{{ flannel_cni_version }}"
# 현재 정보 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 167m kube-flannel docker.io/flannel/flannel:v0.27.3 app=flannel
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# 노드에 미리 이미지 다운로드 해두기 : play 로 미리 다운로드 후 적용이니 굳이 아래 과정 할 필요 없음
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel:v0.27.4
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel-cni-plugin:v1.8.0-flannel1
# flannel 설정 수정
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
flannel_version: 0.27.4
EOF
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# 모니터링
watch -d "ssh k8s-node3 crictl ps"
# flannel tag : Network plugin flannel => 아래 전부 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "cni,network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "cni,network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
## cordon -> apiserver 파드 재생성 -> uncordon
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --limit k8s-node3 -e kube_version="1.32.9"
# flannel 은 ds 이므로 특정 대상 노드로 수행 불가 -> 민감한 클러스터 환경이라면 cni plugin 은 kubespary 와 별로 배포 관리 후 특정 노드별 순차 적용 해야 될듯.
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" -e kube_version="1.32.9"
# 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 3h27m kube-flannel docker.io/flannel/flannel:v0.27.4 app=flannel
...
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
docker.io/flannel/flannel v0.27.4 7a52f3ae4ee60 33.2MB
kubectl get pod -n kube-system -l app=flannel -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-arm64-48r2f 1/1 Running 0 98s 192.168.10.11 k8s-node1 <none> <none>
kube-flannel-ds-arm64-hchn8 1/1 Running 0 108s 192.168.10.15 k8s-node5 <none> <none>
kube-flannel-ds-arm64-jbjw9 1/1 Running 0 2m13s 192.168.10.12 k8s-node2 <none> <none>
kube-flannel-ds-arm64-qf6q9 1/1 Running 0 112s 192.168.10.13 k8s-node3 <none> <none>
kube-flannel-ds-arm64-qtv2m 1/1 Running 0 2m2s 192.168.10.14 k8s-node4 <none> <none># 모니터링
[admin-lb]
watch -d kubectl get node
watch -d kubectl get pod -n kube-system -owide
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
watch -d 'ssh k8s-node1 crictl ps ; echo ; ssh k8s-node1 crictl images'
# ctrl upgrade 14분 소요 : 1.32.9 → 1.32.10
# 이미지 다운 -> ctrl1번 drain -> containderd 업글 -> kubeadm 업글 명령 실행 -> static pod 신규 기동 -> (최초 한번) kube-proxy ds all node 신규 기동 -> 노드 uncordon => ctrl2번 시작...
ansible-playbook -i inventory/mycluster/inventory.ini upgrade-cluster.yml --list-tags
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "kube_control_plane:etcd" | tee kubespray_upgrade.log
# 업그레이드 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 8h v1.32.10 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 8h v1.32.10 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 8h v1.32.10 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 8h v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 7h31m v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# apiserver, kcm, scheduler 와 kube-proxy 가 1.32.10 업그레이드!
# coredns, pause, etcd 는 기존 버전 그대로, 영향 없음.
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
registry.k8s.io/kube-apiserver v1.32.10 03aec5fd5841e 26.4MB
registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
registry.k8s.io/kube-controller-manager v1.32.10 66490a6490dde 24.2MB
registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
registry.k8s.io/kube-proxy v1.32.10 8b57c1f8bd2dd 27.6MB
registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
registry.k8s.io/kube-scheduler v1.32.10 fcf368a1abd0b 19.2MB
registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
registry.k8s.io/pause 3.10 afb61768ce381 268kB
# etcd 확인 : etcd는 버전 업글이 필요없어서 영향 없음
ssh k8s-node1 systemctl status etcd --no-pager | grep active
Active: active (running) since Thu 2026-01-29 14:52:07 KST; 6h ago
ssh k8s-node1 etcdctl.sh member list -w table
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done # etcd 백업 확인
이런 식으로 업그레이드가 된다. 컨트롤 플레인을 cordon, uncordon 하면서 진행이 되고, 이 방식을 graceful upgrade 라고 한다.
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "k8s-node5"
# 확인 후 나머지 노드 실행
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "k8s-node4"
목록에 최대가 1.33.7 이므로 해당 부분 업그레이드를 진행한다.

## 모니터링
[admin-lb]
watch -d kubectl get node
watch -d kubectl get pod -n kube-system -owide
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
[k8s-node1] watch -d 'crictl ps ; echo ; crictl images'
# ctrl upgrade 14분 소요 : 1.32.10 → 1.33.7
# 이미지 다운 -> coredns,metrics-server 버전업 -> ctrl1번 drain -> containderd 업글 -> kubeadm 업글 명령 실행 (아래 이어서)
# -> static pod 신규 기동 -> (최초 한번) kube-proxy ds all node 신규 기동 => ctrl2번...
ansible-playbook -i inventory/mycluster/inventory.ini upgrade-cluster.yml --list-tags
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.33.7" --limit "kube_control_plane:etcd" | tee kubespray_upgrade-2.log
# 업그레이드 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 9h v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 9h v1.33.7 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 9h v1.33.7 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 9h v1.32.10 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 8h v1.32.10 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# apiserver, kcm, scheduler 와 kube-proxy 가 1.32.10 + coredns 업그레이드!
# pause, etcd 는 기존 버전 그대로, 영향 없음.
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
registry.k8s.io/coredns/coredns v1.12.0 f72407be9e08c 19.1MB
registry.k8s.io/kube-apiserver v1.32.10 03aec5fd5841e 26.4MB
registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
registry.k8s.io/kube-apiserver v1.33.7 6d7bc8e445519 27.4MB
registry.k8s.io/kube-controller-manager v1.32.10 66490a6490dde 24.2MB
registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
registry.k8s.io/kube-controller-manager v1.33.7 a94595d0240bc 25.1MB
registry.k8s.io/kube-proxy v1.32.10 8b57c1f8bd2dd 27.6MB
registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
registry.k8s.io/kube-proxy v1.33.7 78ccb937011a5 28.3MB
registry.k8s.io/kube-scheduler v1.32.10 fcf368a1abd0b 19.2MB
registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
registry.k8s.io/kube-scheduler v1.33.7 94005b6be50f0 19.9MB
registry.k8s.io/pause 3.10 afb61768ce381 268kB
# etcd 확인 : etcd는 버전 업글이 필요없어서 영향 없음
ssh k8s-node1 systemctl status etcd --no-pager | grep active
Active: active (running) since Thu 2026-01-29 14:52:07 KST; 6h ago
ssh k8s-node1 etcdctl.sh member list -w table
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done # etcd 백업 확인# wk 파드 기동 확인
kubectl get pod -A -owide | grep node4
kubectl get pod -A -owide | grep node5
# wk upgrade : (최초 한번) kube-proxy ds all node 신규 기동
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.33.7" --limit "kube_node"
# 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 9h v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 9h v1.33.7 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 9h v1.33.7 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 9h v1.33.7 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 8h v1.33.7 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 현재 정보 확인 : etcd 3.5.25
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
>> k8s-node1 <<
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 5.4 MB | true | false | 3 | 2399 | 2399 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
...
# containerd 2.1.5
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 3h47m v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 지원 버전 정보 확인
git --no-pager tag
git describe --tags
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
kubelet_checksums:
arm64:
1.33.7: sha256:3035c44e0d429946d6b4b66c593d371cf5bbbfc85df39d7e2a03c422e4fe404a
1.33.6: sha256:7d8b7c63309cfe2da2331a1ae13cce070b9ba01e487099e7881a4281667c131d
1.33.5: sha256:c6ad0510c089d49244eede2638b4a4ff125258fd29a0649e7eef05c7f79c737f
...
cat /root/kubespray/requirements.txt | grep -v "^#"
ansible==10.7.0
cryptography==46.0.2
jmespath==1.0.1
netaddr==1.3.0
git checkout v2.30.0
git describe --tags
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
kubelet_checksums:
arm64:
1.34.3: sha256:765b740e3ad9c590852652a2623424ec60e2dddce2c6280d7f042f56c8c98619
1.34.2: sha256:3e31b1bee9ab32264a67af8a19679777cd372b1c3a04b5d7621289cf137b357c
1.34.1: sha256:6a66bc08d6c637fcea50c19063cf49e708fde1630a7f1d4ceca069a45a87e6f1
1.34.0: sha256:e45a7795391cd62ee226666039153832d3096c0f892266cd968936e18b2b40b0
1.33.7: sha256:3035c44e0d429946d6b4b66c593d371cf5bbbfc85df39d7e2a03c422e4fe404a
...
# (옵션) python venv 가상환경 : 프로젝트마다 필요한 라이브러리 버전이 다를 때 발생할 수 있는 '버전 충돌' 방지 방안 : Python 패키지를 완전히 분리
python -m venv venv
tree venv -L 2
source venv/bin/activate # cat venv/bin/activate
# Upgrade Python Dependencies
cat /root/kubespray/requirements.txt | grep -v "^#"
ansible==10.7.0
cryptography==46.0.3
jmespath==1.1.0
netaddr==1.3.0
pip3 install -r /root/kubespray/requirements.txt
pip list | grep -E 'cryptography|jmespath'
# upgrade ct 15분: etcd 버전 업글 후 재시작 포함
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.34.3" --limit "kube_control_plane:etcd"
ssh k8s-node1 tree /var/backups
ssh k8s-node1 tree /tmp/releases
# containerd 2.2.1 업글 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 32m v1.34.3 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
...
# upgrade wk 4분
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.34.3" --limit "kube_node"
# kubectl 버전 정보 확인 : 이전에 v1.32 -> v1.33 미실행헀을 경우 아래 처럼 WARNING 출력
kubectl version
Client Version: v1.32.11
Kustomize Version: v5.5.0
Server Version: v1.34.3
WARNING: version difference between client (1.32) and server (1.34) exceeds the supported minor version skew of +/-1
# # kubectl 버전 업그레이드 설치
cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.34/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.34/rpm/repodata/repomd.xml.key
exclude=kubectl
EOF
dnf install -y -q kubectl --disableexcludes=kubernetes
kubectl version
# admin 용 kubeconfig 업데이트
scp k8s-node1:/root/.kube/config /root/.kube/
cat /root/.kube/config | grep server
sed -i 's/127.0.0.1/192.168.10.10/g' /root/.kube/config
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 35m v1.34.3 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
k8s-node2 Ready control-plane 34m v1.34.3 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
k8s-node3 Ready control-plane 34m v1.34.3 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
k8s-node4 Ready <none> 34m v1.34.3 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
k8s-node5 Ready <none> 34m v1.34.3 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# Upgrade Helm https://helm.sh/ko/docs/v3/topics/version_skew , https://github.com/helm/helm/tags
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | DESIRED_VERSION=v3.20.0 bash # or v3.19.5
helm version
version.BuildInfo{Version:"v3.20.0", GitCommit:"b2e4314fa0f229a1de7b4c981273f61d69ee5a59", GitTreeState:"clean", GoVersion:"go1.25.6"}github 에 가서 kubespray 태그를 확인해서 pull 받아보고, 최대 버전을 확인하면서 설치하면 될 것 같다.
kubespray 는 쓰면 쓸 수록 편한 것 같다. kubespray 가 지원하는 최대 쿠버네티스 버전이 있고, 업그레이드를 이렇게 자동화 해 놓은것이 너무 놀랍다.
아직은 쿠버네티스 지식이 부족하지만 계속 채워 나가봐야겠다.