Tag Archives: Google Cloud Platform

Makefile – Start, Stop or Delete 20 VMs at once

Last time I created 20 virtual machines at once.  Now I want to stop those machines, or start them back up, or delete them. Basically, I want to do bulk operations on all of the machines that I am using in this scenario.

If you look at the create 20 VMs post, I gave each one of them a similar name, based on the pattern “load-xxx” where load is the operation I am using them for and xxx is a three digit sequential id with 0s prefixed. (This makes them order correctly in our UI.)

Because I know their names, I can count them up and not have to explicitly tell these operations how many machines I want to operate on.  To do that, I create a make variable that contains the count of all VMs prefixed by “load.”

Once I have that, I can perform batch operations very simply.

To stop 20 running VMs:

Just to explain, like the previous post, we loop from i to COUNT, creating a variable that contains the name of our server, and running a function call to execute the gcloud stop instances command.  Why is this a separate function?  Because I usually do more than just stop the VM.

I also wrap the call in parentheses and append the & to allow multiple calls to execute in parallel.

To start them back up:

To delete them all:

And in this case, I do a little bit more here in delete.  I make sure all of the disks are deleted, and I set the request to quiet. Why? Because I don’t want to confirm this 20 times, silly.

In any case, doing batch operations on my set of VMs is as easy as:

There you have it, fleets of VMs responding in concert to your requests.  As it should be.

Makefile – Launch 20 Compute Engine virtual machines at once.

We’re going to try something a lot more complex in make now. I’m going to dynamically create 20 Compute Engine virtual machines that are absolutely the same. This requires quite a bit more complexity, so we’ll break it down step by step.

Let’s start with the gcloud command to create an instance.  

I encapsulated this into a Makefile function. Why?  Well, as I have it here, it is a pretty simple event with adding apt-get update but I usually do more then just create the node and install software. I often set environmental information or start services, etc. So by putting all of the instance specific instructions in a function, I make it just slightly easier to grok.

Let’s go through this part step by step.  

  • Define a function with the define keyword, and end it with the endef keyword
  • It appears that functions must be one line, so use ;\ to organize multiple calls into one function
  • Wrap all of the real work in a parenthesis. Why? It turns it into one operation, so that each step of the function doesn’t block parallel execution of other operations in the makefile.
  • Capture the first argument – $(1) – passed into this function – we’ll use it as the name of the instance
  • Create a machine using gcloud compute instances create. Note setting the machine type.  If you are creating a lot of instances, make sure you don’t run afoul of quota or spend.
  • SSH into machine and run apt-get update.
  • Tell us this machine is ready.   

Okay, that handles the instance creation, but now we have to loop through and create a variable amount of machines. I said 20, but I often spin up anywhere from 10 to 150 using this method.

Again, step by step:

  • Use @ so that the commands aren’t echoed to the output.
  • Set up a while loop with iterator – i, that will run as long as i is less than the explicitly passed variable named count
  • Use ;\ to make the command one logical line.
  • Use printf to create a variable named server to name the instances. In this case each instance is named “load-xxx” where xxx is a sequential id number for the node that always has three digits. This makes it easier to go back later and do more group operations on the entire set of machines. 
  • Call the function using the syntax $(call function_namevalue_to_pass)
  • Wrap call in parentheses and append a &.  This shoves the call to the background so you can create 20, or 100, or 150 of these in parallel instead of sequentially.
  • We then increment the counter.   

Finally we call the whole thing with:

Pretty straightforward. I frequently use this technique to launch of fleet of VMs to send large amounts of load at App Engine. Next I’ll tell you how to delete them all.   

Don’t forget the count=N, or the call will bail.

Makefile – Clean App Engine flexible environment

One of the more interesting quirks of App Engine flexible environment is that App Engine launches Compute Engine virtual machines that you can’t spin down directly. The way to spin down App Engine flex is to delete all versions of the app.  This will close down all of the VMs, and shut down your App Engine app.

You can do it manually through the web interface, you can do it manually by listing versions in gcloud then deleting them, or you can have a Makefile do it for you.

First I use the trick I wrote about capturing dynamic data from gcloud. Then I pipe that to a Makefile command that will delete the versions.

Note that I add -q to the command because I don’t want to be prompted; I just want them gone.

Makefile – Delete Forwarding Rules

I have a demo where I build a Kubernetes cluster on Container Engine to run a LAMP app. In the demo, I script out a complete build process from an empty project to the full running app.  Testing this requires a clean up that takes me all the way back to an empty project with no cluster.  

There is one thing I do not tear down – static IP addresses.  I don’t tear these down because they are locked to host names in Google Domains, and I use those IPs in my Kubernetes setup to make sure that my cluster app is available at a nice URL and not just a randomly assigned IP.

But I have been running into a problem with this. Sometimes the static IPs hold on to Forwarding Rules that are autogenerated with crazy randomized names by Container Engine. It appears to  happen only when I do a full clean.  I suspect that I am deleting the cluster before it has a chance to issue the command to delete the forwarding rules itself.

In any case, I got tired of dealing with this manually, so I made a Makefile solution. First I get the dynamic list of crazy random forwarding rule names using the Makefile technique I outlined earlier.  Then I pass that list to a gcloud command:

Note that I had to make sure I passed a region, otherwise the command would have prompted me to enter it manually.

Makefile – Get dynamic values from gcloud

Most of the time when I create something in my environment on Google Cloud Platform, I give it a specific name.  For example, I create servers and call them “ThingIWillReferenceLaterWhenIDeleteYou” or more boringly, “Server1.”

Having set names, as I alluded to, makes it easier to clean up after yourself. But there are some cases when you cannot name things when they are created. So it would be nice to get a list of these names. For example, App Engine flexible environment versions for cleaning up after a test.

You can get a list of them with this command:

Which yields this:

Now normally I would have to add extra code to my Makefile to rip out the version names.

But gcloud actually has a robust formatting tool. So instead of running the command above I can run:

And get the JSON representation, which looks like this:

Using a JSON parser might make this easier, but there is an even easier way:

Which yields:

What will this do?

It will list just the value of version.id and it will separate each record it returns with a ” “, not a line break.  This allows me to drop this generated list into any command that takes multiple names and run them. The gcloud CLI takes multiple arguments in this way. 

So to make this applicable to Makefiles I have to do one more thing – take this data and put it in a variable.

Here we are, ready to use this variable in other Make commands. This works for most of the other places in GCP where you see random values spitting out, like IP forwarding rules, and GKE nodes, to name two. 

To learn more about how to filter and format your gcloud commands, check out the Google Cloud Platform Blog.


How Kubernetes Updates Work on Container Engine

I often get asked when I talk about Container Engine (GKE):

How are upgrades to Kubernetes handled?


As we spell out in the documentation, upgrades to Kubernetes masters on GKE are handled by us. They get rolled out automatically.  However, you can speed that up if you would like to upgrade before the automatic update happens.  You can do it via the command line:

You can also do it via the web interface as illustrated below.

GKE notifies you that upgrades are available.
GKE notifies you that upgrades are available.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
Once there, you’ll see that the master upgrade is a one way trip.
Once there, you’ll see that the master upgrade is a one way trip.


Updating nodes is a different story. Node upgrades can be a little more disruptive, and therefore you should control when they happen.  

What do I mean by “disruptive?”

GKE will take down each node of your cluster killing the resident pods.  If your pods are managed via a Replication Controller or part of a Replica Set deployment, then they will be rescheduled on other nodes of the cluster, and you shouldn’t see a disruption of the services those pods serve. However if you are running a Pet Set deployment, using a single Replica to serve a stateful service or manually creating your own pods, then you will see a disruption. Basically, if you are being completely “containery” then no problem.  If you are trying to run a Pet as a containerized service you can see some downtime if you do not intervene manually to prevent that downtime.  You can use a manually configured backup or other type of replica to make that happen.  You can also take advantage of node pools to help make that happen.   But even if you don’t intervene, as long as anything you need to be persistent is hosted on a persistent disk, you will be fine after the upgrade.

You can perform a node update via the command line:

Or you can use the web interface.

Again, you get the “Upgrade Available” prompt.
Again, you get the “Upgrade Available” prompt.
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)

A couple things to consider:

  • As stated in the caption above, we recommend you say within 2 minor revs of your master. These recommendations come from the Kubernetes project, and are not unique to GKE.
  • Additionally, you should not upgrade the nodes to a version higher than the master. The web UI specifically prevents this. Again, this comes from Kubernetes.
  • Nodes don’t automatically update.  But the masters eventually do.  It’s possible that the masters could automatically update to a version more than 2 minor revs beyond the nodes. This can a cause compatibility issues. So we recommend timely upgrades of your nodes. Minor revs come out at about once every 3 months.  Therefore you are looking at this every 6 months or so.

As you can see, it’s pretty straightforward. There are a couple of things to watch out for, so please read the documentation.

Making Kubernetes IP addresses static on Google Container Engine

I’ve been giving a talk and demo about Kubernetes for a few months now, and during my demo, I have to wait for an ephemeral, external IP address from a load balancer to show off that Kubernetes does in fact work.  Consequently, I get asked “Is there any way to have a static address so that you can actually point a hostname at it?” The answer is: of course you can.

Start up your Kubernetes environment, making sure to configure a service with a load balancer.

Once your app is up, make note of the External IP using kubectl get services.


Now go to the Google Cloud Platform Console -> Networking -> External IP Addresses.

Find the IP you were assigned earlier. Switch it from “Ephemeral” to “Static.” You will have to give it a name and it would be good to give it a description so you know why it is static.


Then modify your service (or service yaml file) to point to this static address. I’m going to modify the yaml.   


Once your yaml is modified you just need to run it; use kubectl apply -f service.yaml.

To prove that the IP address works, you should kubectl delete the service and then kubectl apply, but you don’t have to do that. If you do that though, please be aware that although your IP address is locked in, your load balancer still needs a little bit of time to fire up.  

Instead of this method, you can create a static IP address ahead of time and create the forwarding rules manually. I think that’s its own blog post, and  I think it is just easier to let Container Engine do it.

I got lots of help for this post from wernight’s answer on StackOverflow, and the documentation on Kubernetes Services.

I can confirm this works with Google Container Engine. It should work with a Kubernetes cluster installed by hand on Google Cloud Platform.  I couldn’t ascertain if it works on other cloud providers.

Kubernetes Secrets Directly to Environment Variables

kubernetes-secretsI’ve found myself wanting to use Kubernetes Secrets for a while, but I every time I did, I ran into the fact that secrets had to be mounted as files in the container, and then you had to programmatically grab those secrets and turn them into environment variables.  This works and there are posts like this great one from my coworker, Aja Hammerly that tell you how to do it.

It always seemed a little suboptimal for me though.  Mostly because you had to alter your Docker image in order to use secrets. Then you lose some of the flexibility to use a Dockerfile in both Docker and Kubernetes. It’s not the end of the world – you can write a conditional script –  but I never liked doing this.  It would be awesome if you could just write Secrets directly to ENV variables.

Well it turns out you can. Right there in the documentation there’s a whole section on Using Secrets as Environment Variables. It’s pretty straightforward:

Make a Secrets file, remembering to base64 encode your secrets.

Then configure your pod definition to use the secrets.

That’s it. It’s a great addition to the secrets API.  I’m trying to track down when it was added. It looks like it came in 1.2.  The first reference I could find to it in the docs was in this commit  updating Kubernetes Documentation for 1.2.

Autoresizing Persistent Disks in Compute Engine

Got a challenge the other day:

Is it possible to automatically resize a Persistent Disk in Google Compute Engine?

The answer is yes – with a few caveats.  

This solution really only works with Persistent Disks that are not root. Root disks seem to need a reboot to make this work – and automatically rebooting seems like a bad idea. So if you run it on a root disk it will work, but the extra space won’t be available until you manually reboot the machine.

Be careful with quotas. My solution here has a default max disk size of 64TB because that is the max disk that GCE disks can be. You may want to be more conservative with your limits because disk size = money. Also you have a quota on your account for the amount of SSD you can assign.  As of this writing it is 2TB.  You can always raise it, but this script cannot get around your quota, and will fail if it tries to.

All that out of the way, let’s give this a shot.

Step 1 – Script it

The first step is to put together a script that:

  • Checks the utilization of a disk.
  • If the utilization is too high, resizes the disk in Google Cloud Platform
  • Then also resizes the disk on the host OS.

There are a couple of other things we want to configure in this script:

  • What is the threshold percent that is high enough to resize the disk?
  • What is the factor by that we’ll increase the disk? Double it? Triple it?
  • What is the maximum limit to which we will increase the disk?

Keeping all of that in mind, here is my solution in Bash for Debian (our default OS choice on Compute Engine.) As you can see it’s a mix of gcloud commands and df.

Source is also available in GitHub.

You can find the reference for the gcloud commands in the documentation.

Step 2 – Authorize it

The next step is to make sure this script can run at all.  To do that we have to delve into Cloud IAM.

First we want to create a service account. During this process we have the option to ‘Furnish a new private key’. This will cause a key file to be downloaded at the end of file creation. Choose JSON and keep track of the JSON file that gets downloaded after you click ‘Create’.


Add the service account to the IAM role – Compute Storage Admin. Then remove the service account from the project level role – Editor. We want it to have as little permission as it needs.    


Copy the JSON file to the Compute Engine machine to which the disk you wish to monitor is attached.

Authorize the service account using the following command.


My co-worker, Sandeep, has a good video tutorial about service accounts if you need more information.

Step 3 Test it

Assuming you have installed the autoscale-disk script from step 1,  and you set up permissions correctly, you are ready to test it.  

To check the permissions, run:

If you see the output of a gcloud compute disk list there, you got it right. If you do not, you will see a FAILURE message.

Step 4 – Cron it

Once you have the script installed, and you have tested it – it’s time to set it and forget it. Add it to crontab with your desired settings.


I’m setting this up to check every minute, because it’s pretty lightweight when it isn’t actually resizing disks. However do what you will. You might also want to pipe the output to a log. Again, your call.


There you have it, autoscaling a disk based on utilization with a cron job. What I love about this idea is that it is so very cloudy. On prem, even if you have a pool of storage, eventually you run out, so sizing up a disk isn’t a sure thing.  But in a cloud world, if you need more it’s always just an API call away.


Migrating App Engine Standard to Cloud SQL v2

I recently discovered that Google Cloud SQL v2 now supports App Engine standard runtime.  This is very exciting for me. I wanted to try out the process and make sure there were no gotchas.


  • I created a new Cloud SQL v2 instance.
  • I used my syncing script from my blog post Migrating between Cloud SQL databases to move the data to the new instance.
  • I created a new App Engine module by store a new version of my app using the old code base.
  • I changed the connection string from the old database to the new one. The pattern to make this happen has changed a bit, more down below.
  • That was it.  The new code served up just fine. I served up on the old module until I made the connection string config tweak to the old code base.

New connection string

The new connection strings are only a little different than the old ones, and should require just a change to one string in your config.

The directions will tell you to look for your Instance connection name in the Instance properties of your Cloud SQL Developer Console. There are two patterns that these strings come in. 

  • V1 connection names follow the pattern projectid:instancename
  • V2 connection names follow the pattern projectid:regionname:instancename.

It’s a pretty simple change, but I can see someone accidentally (or willfully) not reading the documentation and getting tripped up on this. The new connection strings require region name; that’s all there is to it.  I’ve tested this on PHP. I assume it works everywhere. But your mileage may vary.  Golang tests are coming soon; I will update when I make that change.