Occasionally, I see an issue where a pod will start up without network connectivity. Because of this, the pod goes into a CrashLoopBackOff and is unable to recover. The only way I am able to get the pod running again is by running a kubectl delete pod
and waiting for it to reschedule. Here's an example of a liveness probe failing due to this issue:
Liveness probe failed: Get http://172.20.78.9:9411/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I've also noticed that there are no iptables entries for the pod IP when this happens. When the pod is deleted and rescheduled (and is in a working state) I have the iptables entries.
If I turn off the livenessprobe in the container and exec into it, I confirmed it has no network connectivity to the cluster or the local network or internet.
Would like to hear any suggestions as to what it could be or what else I can look into to further troubleshoot this scenario.
Currently running:
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7",
GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0",
GitTreeState:"clean", BuildDate:"2016-12-10T04:49:33Z",
GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7",
GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0",
GitTreeState:"clean", BuildDate:"2016-12-10T04:43:42Z",
GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
OS:
NAME=CoreOS
ID=coreos
VERSION=1235.0.0
VERSION_ID=1235.0.0
BUILD_ID=2016-11-17-0416
PRETTY_NAME="CoreOS 1235.0.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"