kubernetes - Occasionally pods will be created with no network which results in the pod failing repeatedly with CrashLoopBackOff

Question

Occasionally, I see an issue where a pod will start up without network connectivity. Because of this, the pod goes into a CrashLoopBackOff and is unable to recover. The only way I am able to get the pod running again is by running a kubectl delete pod and waiting for it to reschedule. Here's an example of a liveness probe failing due to this issue:

Liveness probe failed: Get http://172.20.78.9:9411/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I've also noticed that there are no iptables entries for the pod IP when this happens. When the pod is deleted and rescheduled (and is in a working state) I have the iptables entries.

If I turn off the livenessprobe in the container and exec into it, I confirmed it has no network connectivity to the cluster or the local network or internet.

Would like to hear any suggestions as to what it could be or what else I can look into to further troubleshoot this scenario.

Currently running:

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7",
GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", 
GitTreeState:"clean", BuildDate:"2016-12-10T04:49:33Z", 
GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7",  
GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", 
GitTreeState:"clean", BuildDate:"2016-12-10T04:43:42Z", 
GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

OS:

NAME=CoreOS
ID=coreos
VERSION=1235.0.0
VERSION_ID=1235.0.0
BUILD_ID=2016-11-17-0416
PRETTY_NAME="CoreOS 1235.0.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

score 0 · Accepted Answer

このバグhttps://github.com/coreos/bugs/issues/1785にぶつかっていると思います。docker/coreos のバージョンに記載されているバグを再現できることを確認しました。coreos/docker で検証します。

kubernetes - Occasionally pods will be created with no network which results in the pod failing repeatedly with CrashLoopBackOff

4 に答える 4

Related

Reference