DHCP模式下的k8s排错

环境:

  • Kubernetes
  • Bridge+DHCP

kubelet 日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Oct 10 16:52:58 k8node3 kubelet[38221]: E1010 16:52:58.848278   38221 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:52:58 k8node3 kubelet[38221]: E1010 16:52:58.848293 38221 kuberuntime_manager.go:646] createPodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:52:58 k8node3 kubelet[38221]: E1010 16:52:58.848349 38221 pod_workers.go:186] Error syncing pod 88d40907-cc60-11e8-9598-ecebb88a11d4 ("log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)"), skipping: failed to "CreatePodSandbox" for "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" with CreatePodSandboxError: "CreatePodSandbox for pod \"log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"log-pilot-msvj7_default\" network: error calling DHCP.Allocate: no more tries"
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.433003 38221 cni.go:259] Error adding network: error calling DHCP.Allocate: no more tries
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.433036 38221 cni.go:227] Error while adding to cni network: error calling DHCP.Allocate: no more tries
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.505258 38221 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.505313 38221 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.505327 38221 kuberuntime_manager.go:646] createPodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:53:43 k8node3 kubelet[38221]: E1010 16:53:43.505388 38221 pod_workers.go:186] Error syncing pod 88d40907-cc60-11e8-9598-ecebb88a11d4 ("log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)"), skipping: failed to "CreatePodSandbox" for "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" with CreatePodSandboxError: "CreatePodSandbox for pod \"log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"log-pilot-msvj7_default\" network: error calling DHCP.Allocate: no more tries"
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.112837 38221 cni.go:259] Error adding network: error calling DHCP.Allocate: no more tries
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.112865 38221 cni.go:227] Error while adding to cni network: error calling DHCP.Allocate: no more tries
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.190742 38221 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.190800 38221 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.190815 38221 kuberuntime_manager.go:646] createPodSandbox for pod "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries
Oct 10 16:54:29 k8node3 kubelet[38221]: E1010 16:54:29.190899 38221 pod_workers.go:186] Error syncing pod 88d40907-cc60-11e8-9598-ecebb88a11d4 ("log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)"), skipping: failed to "CreatePodSandbox" for "log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)" with CreatePodSandboxError: "CreatePodSandbox for pod \"log-pilot-msvj7_default(88d40907-cc60-11e8-9598-ecebb88a11d4)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"log-pilot-msvj7_default\" network: error calling DHCP.Allocate: no more tries"
Oct 10 16:55:13 k8node3 kubelet[38221]: E1010 16:55:13.604965 38221 cni.go:259] Error adding network: error calling DHCP.Allocate: no more tries
Oct 10 16:55:13 k8node3 kubelet[38221]: E1010 16:55:13.605013 38221 cni.go:227] Error while adding to cni network: error calling DHCP.Allocate: no more tries
Oct 10 16:55:13 k8node3 kubelet[38221]: E1010 16:55:13.692412 38221 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "log-pilot-msvj7_default" network: error calling DHCP.Allocate: no more tries

cni 插件日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2018/10/10 16:55:15 d61d6dd5cbda1c51769489334a566a1c4ca6f258efee60375aaf2e71bc116c32/zzcloudnet: acquiring lease
2018/10/10 16:55:20 resource temporarily unavailable
2018/10/10 16:55:29 resource temporarily unavailable
2018/10/10 16:55:43 resource temporarily unavailable
2018/10/10 16:56:00 bb55ff4ecac38220cbe88d0e34c7bc4296c6faf181fb24499694f9c76881e02c/zzcloudnet: acquiring lease
2018/10/10 16:56:05 resource temporarily unavailable
2018/10/10 16:56:15 resource temporarily unavailable
2018/10/10 16:56:28 resource temporarily unavailable
2018/10/10 16:56:45 07c373e24b35ec12d66f991b32a206217288ba018648071cea4f87fc9e055658/zzcloudnet: acquiring lease
2018/10/10 16:56:50 resource temporarily unavailable
2018/10/10 16:57:00 resource temporarily unavailable
2018/10/10 16:57:12 resource temporarily unavailable
2018/10/10 16:57:30 c3d52c2c4ea0db096b079da202ba0733b91e9c68d3b2db7ba6dc4d67c71e6546/zzcloudnet: acquiring lease
2018/10/10 16:57:35 resource temporarily unavailable
2018/10/10 16:57:44 resource temporarily unavailable

DHCP 服务端抓包 tcpdump -n -i cni0 发现, 服务端接收到了客户端的 DISCOVER, 并且服务端给了 OFFER, 但却迟迟收不到客户端发来的 REQUEST, 由此可以定位到问题, 要么包在服务端发不出去, 要么是客户端收不着, 着重检查两者之间的防火墙设置以及网络连通性.