How Kubernetes Updates Work on Container Engine

I often get asked when I talk about Container Engine (GKE):

How are upgrades to Kubernetes handled?

Masters

As we spell out in the documentation, upgrades to Kubernetes masters on GKE are handled by us. They get rolled out automatically.  However, you can speed that up if you would like to upgrade before the automatic update happens.  You can do it via the command line:

You can also do it via the web interface as illustrated below.

GKE notifies you that upgrades are available.
GKE notifies you that upgrades are available.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
Once there, you’ll see that the master upgrade is a one way trip.
Once there, you’ll see that the master upgrade is a one way trip.

Nodes

Updating nodes is a different story. Node upgrades can be a little more disruptive, and therefore you should control when they happen.  

What do I mean by “disruptive?”

GKE will take down each node of your cluster killing the resident pods.  If your pods are managed via a Replication Controller or part of a Replica Set deployment, then they will be rescheduled on other nodes of the cluster, and you shouldn’t see a disruption of the services those pods serve. However if you are running a Pet Set deployment, using a single Replica to serve a stateful service or manually creating your own pods, then you will see a disruption. Basically, if you are being completely “containery” then no problem.  If you are trying to run a Pet as a containerized service you can see some downtime if you do not intervene manually to prevent that downtime.  You can use a manually configured backup or other type of replica to make that happen.  You can also take advantage of node pools to help make that happen.   But even if you don’t intervene, as long as anything you need to be persistent is hosted on a persistent disk, you will be fine after the upgrade.

You can perform a node update via the command line:

Or you can use the web interface.

Again, you get the “Upgrade Available” prompt.
Again, you get the “Upgrade Available” prompt.
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)

A couple things to consider:

  • As stated in the caption above, we recommend you say within 2 minor revs of your master. These recommendations come from the Kubernetes project, and are not unique to GKE.
  • Additionally, you should not upgrade the nodes to a version higher than the master. The web UI specifically prevents this. Again, this comes from Kubernetes.
  • Nodes don’t automatically update.  But the masters eventually do.  It’s possible that the masters could automatically update to a version more than 2 minor revs beyond the nodes. This can a cause compatibility issues. So we recommend timely upgrades of your nodes. Minor revs come out at about once every 3 months.  Therefore you are looking at this every 6 months or so.

As you can see, it’s pretty straightforward. There are a couple of things to watch out for, so please read the documentation.

3 thoughts on “How Kubernetes Updates Work on Container Engine

  1. You mention PetSet deployment on GKE. I’m currently trying to get it to work. The apps/v1alpha1 is enabled (at least kubectl api-versions indicated it is). However, when I try to create a PetSet, I get:

    unable to decode “pet.yaml”: no kind “PetSet” is registered for version “apps/v1alpha1”

  2. Hmm, I am not using PetSets but I have been unable to do a zero-downtime K8s upgrade on GKE. I spun up a new node pool, then then tried to do a cordon and drain to migrate the pods scheduled on the old nodes over to the new nodes but get the following error:
    kubectl drain gke-k8s-cluster-1-us-west-1b-32607730-jznb –ignore-daemonsets
    node “gke-k8s-cluster-1-us-west-1b-32607730-jznb” already cordoned
    error: pods not managed by ReplicationController, ReplicaSet, Job, or DaemonSet (use –force to override): fluentd-cloud-logging-gke-k8s-cluster-1-us-west-1b-32607730-jznb, kube-proxy-gke-k8s-cluster-1-us-west-1b-32607730-jznb

    When I “–force” it my applications drop until they eventually get rescheduled onto the new node pool.

    I actually opened a ticket with GKE support and they said zero-downtime upgrades are not yet supported, but they are working on it. Any ideas?

Leave a Reply

Your email address will not be published. Required fields are marked *