c4rt1y

k8s之使用问题解决

0x01.Kubernetes安装问题

Kubernetes 是一个自动化部署、伸缩和操作应用程序容器的开源平台。根据上一篇

k8s之yum安装

后续之后会出现一些问题,比如镜像下载失败,Pod创建失败,网络无法连接等等

0x02.现象和解决方法

此处采取流程化,模拟出现的问题情况,不一定出现该问题,只作为笔记。

#使用kubectl创建一个环节,查看pod,发现一直处于卡顿状态
[root@k8s-master /]# kubectl get pod --namespace=kube-system
NAME                         READY     STATUS    		RESTARTS   	AGE
kubernetes-dashboard-2k7tm   0/1       ContainerCreating   	0       20m


#问题一:查看pod描述
[root@k8s-master /]# kubectl describe pod kubernetes-dashboard-2k7tm 

报错:image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request.  details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)

解决方案:yum install *rhsm* -y  (需要重启master)

#问题二:处理完毕,继续查看
[root@k8s-master /]# kubectl get pod --namespace=kube-system
NAME                         READY     STATUS    	RESTARTS   AGE
kubernetes-dashboard-2k7tm   0/1       Terminating   0         40m

报错:Failed to create pod infra container: ImagePullBackOff; Skipping pod "redis-master-jj6jw_default(fec25a87-cdbe-11e7-ba32-525400cae48b)": Back-off pulling image "registry.access.redhat.com/rhel7/pod-infrastructure:latest

解决方法:原因是由于镜像被中国防火墙给拦截了,需要翻墙下载,解决方案是调用别人的镜像
docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latest

#问题三:处理完毕,继续查看
[root@k8s-master /]# kubectl get pod --namespace=kube-system
NAME                         READY     STATUS    	RESTARTS   AGE
kubernetes-dashboard-2k7tm   0/1       Terminating   0         1h

报错:Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"default", Name:"kubernetes-dashboard", UID:"a4f26317-cdbe-11e7-bad0-525400cae48b", APIVersion:"v1", ResourceVersion:"13245", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: No API token found for service account "default", retry after the token is automatically created and added to the service account

解决方法:关闭api的servicecount,修改/etc/kubernetes/apiserver
KUBE_ADMISSION_CONTROL="--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota"

#问题四:处理完毕,继续查看
[root@k8s-master /]# kubectl get pod --namespace=kube-system
NAME                         READY     STATUS    	RESTARTS   AGE
kubernetes-dashboard-2k7tm   1/1       Running   	0         	1h

无报错信息,理论上是Ok,但是发现k8s-master无法访问k8s-node-1的docker容器内容,查询整体配置,发现k8s安装无误(另安装了一台服务器,单点安装master和node节点,发现可以使用),怀疑是flanneld问题,首先查看etcd的信息

#查看etcd问题
[root@k8s-master /]# etcdctl -C http://127.0.0.1:2379 cluster-health

#查看网络连接情况1
[root@k8s-master /]# etcdctl --endpoints http://127.0.0.1:2379 get  /atomic.io/network/config 
[root@k8s-master /]# etcdctl --endpoints http://127.0.0.1:2379 ls  /atomic.io/network/subnets

#查看网络连接情况2
[root@k8s-master /]# curl -L http://127.0.0.1:2379/v2/keys/atomic.io/network/config
[root@k8s-master /]# curl -L http://127.0.0.1:2379/v2/keys/atomic.io/network/subnets

#查看docker网络连接状态 DOCKER_NETWORK_OPTIONS
[root@k8s-node-1 /]# cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target rhel-push-plugin.socket registries.service
Wants=docker-storage-setup.service
Requires=docker-cleanup.timer

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/run/containers/registries.conf
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
          --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
          --default-runtime=docker-runc \
          --exec-opt native.cgroupdriver=systemd \
          --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
          --seccomp-profile=/etc/docker/seccomp.json \
          $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $ADD_REGISTRY \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY \
	  $REGISTRIES
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
KillMode=process

#查看发现连接/run/flannel/docker
[root@k8s-node-1 /]# cat /usr/lib/systemd/system/docker.service.d/flannel.conf
[Service]
EnvironmentFile=-/run/flannel/docker

#进一步发现DOCKER_NETWORK_OPTIONS是ok的
[root@k8s-node-1 /]# cat /run/flannel/docker
DOCKER_OPT_BIP="--bip=172.17.86.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=true"
DOCKER_OPT_MTU="--mtu=1450"
DOCKER_NETWORK_OPTIONS=" --bip=172.17.86.1/24 --ip-masq=true --mtu=1450"

[root@k8s-node-1 /]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=172.17.0.0/16
FLANNEL_SUBNET=172.17.86.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=false

#疑问是目前docker环境都Ok,但是网络不可达,测试k8s-node-1上访问docker节点可以访问且ping通,遍历google,某些文章提到iptables,默认centos7未安装iptables。

#安装iptables
[root@k8s-node-1 /]# yum -y install -y iptables
#重启
[root@k8s-node-1 /]# systemctl restart flanneld
发现失败,然后在node节点添加iptables规则
[root@k8s-node-1 /]# iptables -A FORWARD -s  172.16.0.0/16 -j ACCEPT
#测试通过,相互间可以访问

0x03.总结

在学习的时候,发现Service IP是通过iptables进行转发,但是当初以为他是基于底层netfilter进行转发,但是没想到是iptables,其他的几个问题比较简单,排查也比较容器。只能说,多思考,多出bug,熟能生巧罢了。

0x04.资料来源

https://blog.csdn.net/magerguo/article/details/72123353
http://blog.51cto.com/tsing/1983480
https://tonybai.com/2017/01/17/understanding-flannel-network-for-kubernetes/
GoTop