On occasion a Tectonic clusters may hang when updating due to pre-warm cache failure. One possible issue is the node-agent pod on a specific host looping on download of the
First, Identify the node that has the old kubelet version:
kubectl get nodes -o wide
Next, SSH to the node and ensure the correct hyperkube image exists on the host. This can be done by via
docker images | grep hyperkube or by attempting to pull the image. For example if the cluster is updating to 1.9.6 the pull command is:
docker pull quay.io/coreos/hyperkube:v.1.9.6_coreos.0
Finally, after it has been verified that the node has the right hyperkube image. Remove the
node-agent.v1.coreos.com/image-cache-list annotation from the node:
kubectl annotate node-name node-agent.v1.coreos.com/image-cache-list-
This should allow the update to proceed.
kubectl get nodes -w can be used to view the latest status of the node as it updates.