Tag Archives: Google Compute Engine

Makefile – Start, Stop or Delete 20 VMs at once

Last time I created 20 virtual machines at once.  Now I want to stop those machines, or start them back up, or delete them. Basically, I want to do bulk operations on all of the machines that I am using in this scenario.

If you look at the create 20 VMs post, I gave each one of them a similar name, based on the pattern “load-xxx” where load is the operation I am using them for and xxx is a three digit sequential id with 0s prefixed. (This makes them order correctly in our UI.)

Because I know their names, I can count them up and not have to explicitly tell these operations how many machines I want to operate on.  To do that, I create a make variable that contains the count of all VMs prefixed by “load.”

Once I have that, I can perform batch operations very simply.

To stop 20 running VMs:

Just to explain, like the previous post, we loop from i to COUNT, creating a variable that contains the name of our server, and running a function call to execute the gcloud stop instances command.  Why is this a separate function?  Because I usually do more than just stop the VM.

I also wrap the call in parentheses and append the & to allow multiple calls to execute in parallel.

To start them back up:

To delete them all:

And in this case, I do a little bit more here in delete.  I make sure all of the disks are deleted, and I set the request to quiet. Why? Because I don’t want to confirm this 20 times, silly.

In any case, doing batch operations on my set of VMs is as easy as:

There you have it, fleets of VMs responding in concert to your requests.  As it should be.

Makefile – Launch 20 Compute Engine virtual machines at once.

We’re going to try something a lot more complex in make now. I’m going to dynamically create 20 Compute Engine virtual machines that are absolutely the same. This requires quite a bit more complexity, so we’ll break it down step by step.

Let’s start with the gcloud command to create an instance.  

I encapsulated this into a Makefile function. Why?  Well, as I have it here, it is a pretty simple event with adding apt-get update but I usually do more then just create the node and install software. I often set environmental information or start services, etc. So by putting all of the instance specific instructions in a function, I make it just slightly easier to grok.

Let’s go through this part step by step.  

  • Define a function with the define keyword, and end it with the endef keyword
  • It appears that functions must be one line, so use ;\ to organize multiple calls into one function
  • Wrap all of the real work in a parenthesis. Why? It turns it into one operation, so that each step of the function doesn’t block parallel execution of other operations in the makefile.
  • Capture the first argument – $(1) – passed into this function – we’ll use it as the name of the instance
  • Create a machine using gcloud compute instances create. Note setting the machine type.  If you are creating a lot of instances, make sure you don’t run afoul of quota or spend.
  • SSH into machine and run apt-get update.
  • Tell us this machine is ready.   

Okay, that handles the instance creation, but now we have to loop through and create a variable amount of machines. I said 20, but I often spin up anywhere from 10 to 150 using this method.

Again, step by step:

  • Use @ so that the commands aren’t echoed to the output.
  • Set up a while loop with iterator – i, that will run as long as i is less than the explicitly passed variable named count
  • Use ;\ to make the command one logical line.
  • Use printf to create a variable named server to name the instances. In this case each instance is named “load-xxx” where xxx is a sequential id number for the node that always has three digits. This makes it easier to go back later and do more group operations on the entire set of machines. 
  • Call the function using the syntax $(call function_namevalue_to_pass)
  • Wrap call in parentheses and append a &.  This shoves the call to the background so you can create 20, or 100, or 150 of these in parallel instead of sequentially.
  • We then increment the counter.   

Finally we call the whole thing with:

Pretty straightforward. I frequently use this technique to launch of fleet of VMs to send large amounts of load at App Engine. Next I’ll tell you how to delete them all.   

Don’t forget the count=N, or the call will bail.

Autoresizing Persistent Disks in Compute Engine

Got a challenge the other day:

Is it possible to automatically resize a Persistent Disk in Google Compute Engine?

The answer is yes – with a few caveats.  

This solution really only works with Persistent Disks that are not root. Root disks seem to need a reboot to make this work – and automatically rebooting seems like a bad idea. So if you run it on a root disk it will work, but the extra space won’t be available until you manually reboot the machine.

Be careful with quotas. My solution here has a default max disk size of 64TB because that is the max disk that GCE disks can be. You may want to be more conservative with your limits because disk size = money. Also you have a quota on your account for the amount of SSD you can assign.  As of this writing it is 2TB.  You can always raise it, but this script cannot get around your quota, and will fail if it tries to.

All that out of the way, let’s give this a shot.

Step 1 – Script it

The first step is to put together a script that:

  • Checks the utilization of a disk.
  • If the utilization is too high, resizes the disk in Google Cloud Platform
  • Then also resizes the disk on the host OS.

There are a couple of other things we want to configure in this script:

  • What is the threshold percent that is high enough to resize the disk?
  • What is the factor by that we’ll increase the disk? Double it? Triple it?
  • What is the maximum limit to which we will increase the disk?

Keeping all of that in mind, here is my solution in Bash for Debian (our default OS choice on Compute Engine.) As you can see it’s a mix of gcloud commands and df.

Source is also available in GitHub.

You can find the reference for the gcloud commands in the documentation.

Step 2 – Authorize it

The next step is to make sure this script can run at all.  To do that we have to delve into Cloud IAM.

First we want to create a service account. During this process we have the option to ‘Furnish a new private key’. This will cause a key file to be downloaded at the end of file creation. Choose JSON and keep track of the JSON file that gets downloaded after you click ‘Create’.


Add the service account to the IAM role – Compute Storage Admin. Then remove the service account from the project level role – Editor. We want it to have as little permission as it needs.    


Copy the JSON file to the Compute Engine machine to which the disk you wish to monitor is attached.

Authorize the service account using the following command.


My co-worker, Sandeep, has a good video tutorial about service accounts if you need more information.

Step 3 Test it

Assuming you have installed the autoscale-disk script from step 1,  and you set up permissions correctly, you are ready to test it.  

To check the permissions, run:

If you see the output of a gcloud compute disk list there, you got it right. If you do not, you will see a FAILURE message.

Step 4 – Cron it

Once you have the script installed, and you have tested it – it’s time to set it and forget it. Add it to crontab with your desired settings.


I’m setting this up to check every minute, because it’s pretty lightweight when it isn’t actually resizing disks. However do what you will. You might also want to pipe the output to a log. Again, your call.


There you have it, autoscaling a disk based on utilization with a cron job. What I love about this idea is that it is so very cloudy. On prem, even if you have a pool of storage, eventually you run out, so sizing up a disk isn’t a sure thing.  But in a cloud world, if you need more it’s always just an API call away.


Compute Engine and App Engine – a Comparison


I want to show you a little demo of how Compute Engine and App Engine work. Both techs have their strengths and weaknesses, and I wanted to make something to showcase them. 

Compute Engine allows you to spin up Virtual Machines (henceforth to be referred to as “VMs” due to the fact that I can’t be bothered to write “irtual” and “achine”.) VMs give you a lot of control over your system. You can run a number of OSes, with variable processor, memory, and disk configurations. You interact with it by configuring a VM through the Developer Console or on the command line. You then SSH into your VM.

App Engine on the other hand just takes code.  You upload it and we run it. No SSH, no machine, just an upload site and a URL. App Engine by default gives you no control over the hardware running the code. The trade off is that we can immediately scale from zero load to any load you muster.

So how do these compare? App Engine scales in milliseconds? What does that look like? Compute Engine starts up in 10s of seconds? What does that mean? This demo shows off how you can build Compute Engine machines vs how fast you can spin up App Engine instances. This isn’t a one-is-better-than-the-other comparison; there are reasons to use both of these techs, and they aren’t mutually exclusive. Let me know what you think.