修复系统更新后 K8s 集群异常

今日闲来无事手贱,对测试K8s集群执行

yum -y update

之后集群起不来了。。。

kubectl get nodes
E1213 12:23:44.334665    1480 memcache.go:238] 
couldn't get current server API group list: 
Get "https://172.16.250.100:6443/api?timeout=32s": dial tcp 172.16.250.100:6443:
connect: connection refused
The connection to the server 172.16.250.100:6443 
was refused - did you specify the right host or port?

查看selinux关闭的,swap分区也是关闭的,防火墙关闭的,运行的容器为空

root@k8s-master ~]# getenforce
Disabled
[root@k8s-master ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)
[root@k8s-master ~]# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

查看kubelet服务日志

[root@k8s-master ~]# journalctl -xefu kubelet
-- Logs begin at Tue 2022-12-13 12:21:55 CST. --
Dec 13 12:22:52 k8s-master systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
Dec 13 12:22:55 k8s-master kubelet[1113]: E1213 12:22:55.394853    1113 run.go:74] "command failed" 
err="failed to parse kubelet flag: unknown flag: --network-plugin"
Dec 13 12:22:55 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Dec 13 12:22:55 k8s-master systemd[1]: Unit kubelet.service entered failed state.
Dec 13 12:22:55 k8s-master systemd[1]: kubelet.service failed.
Dec 13 12:23:05 k8s-master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Dec 13 12:23:05 k8s-master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down

经过关键字查找,发现是更新之后k8s自动升级到了1.26版本,由于1.21版本之后弃用docker所以导致集群不可用

[root@k8s-master ~]# rpm -qa|grep kube
kubernetes-cni-1.1.1-0.x86_64
kubelet-1.26.0-0.x86_64
kubectl-1.26.0-0.x86_64
kubeadm-1.26.0-0.x86_64

解决办法:

将集群中所有节点降级,把k8s相关服务降级到1.22版本,虽然官方说明1.21之后弃用docker但是,1.22还是可用的

此处为测试环境,生产环境建议严格按照官方要求

[root@k8s-master ~]# yum downgrade kubelet-1.22.0-0.x86_64 \
kubeadm-1.22.0-0.x86_64 \
kubectl-1.22.0-0.x86_64

重载服务,查看服务状态

[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2022-12-13 12:33:16 CST; 2min 1s ago
……
Hint: Some lines were ellipsized, use -l to show in full.

安装完成后,经查询集群已全部恢复正常

[root@k8s-master ~]# kubectl get nodes,pods,svc -o wide
NAME              STATUS   ROLES                  AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
node/k8s-master   Ready    control-plane,master   13d   v1.22.0   172.16.250.100   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   docker://20.10.21
node/k8s-node1    Ready    <none>                 13d   v1.22.0   172.16.250.101   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   docker://20.10.21
node/k8s-node2    Ready    <none>                 13d   v1.22.0   172.16.250.102   <none>        CentOS Linux 7 (Core)   5.4.226-1.el7.elrepo.x86_64   docker://20.10.21

NAME                         READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
pod/nginx-6799fc88d8-2tcbt   1/1     Running   2          13d   10.244.1.4   k8s-node1   <none>           <none>

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE   SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        13d   <none>
service/nginx        NodePort    10.99.187.196   <none>        80:32226/TCP   13d   app=nginx
相关推荐
python ERROR: Command errored out with exit status 1:
Windows Server 2016 搭建 SMB 共享文件
Windows Server 2019 域环境搭建 SMB 共享文件服务
Parallels Desktop 15.1.3