On occasion a Tectonic clusters may hang when updating due to pre-warm cache failure. One possible issue is the node-agent pod on a specific host looping on download of the quay.io/coreos/hyperkube
image.
First, Identify the node that has the old kubelet version: kubectl get nodes -o wide
Next, SSH to the node and ensure the correct hyperkube image exists on the host. This can be done by via docker images | grep hyperkube
or by attempting to pull the image. For example if the cluster is updating to 1.9.6 the pull command is: docker pull quay.io/coreos/hyperkube:v.1.9.6_coreos.0
Finally, after it has been verified that the node has the right hyperkube image. Remove the node-agent.v1.coreos.com/image-cache-list
annotation from the node:
kubectl annotate node-name node-agent.v1.coreos.com/image-cache-list-
This should allow the update to proceed. kubectl get nodes -w
can be used to view the latest status of the node as it updates.
Comments
0 comments
Please sign in to leave a comment.