Musings on Devops, Systems, and Infrastructure

Save money by shutting down idle AWS ec2 instances (part 1)

The cloud provides a ton of flexibility for configuring dev and test environments, but it comes at a price. Providers make it easy to start services at the click of a button, but remembering to shut them down takes effort. Reducing cloud spend often tops the list of cost saving priorities for an IT organization because cloud resources aren't always used effectively.

Letting unused resources sit idle can add up quickly, particularly if you've created an entire infrastructure with Terraform or similar tool. You're getting charged for every compute instance that's running, even if it's not doing anything.

To help solve this problem for AWS and ec2 instances, I've developed some automation to shut down instances when they become idle for some period of time. A hundred different methods probably exist for how to power off unused virtual machines, but the method I describe here works well for my use case.

Methodology

The basic steps are:

  • Quantify what it means for an instance to be "idle".
  • Configure monitoring to record these metrics.
  • Generate an alert when a metric reaches a threshold.
  • Send the alert to a CI/CD server to trigger a pipeline.
  • The pipeline runs with permissions to shutdown the instance.
  • Tell your boss how much money you saved.

Defining "idle"

In order for an instance to be considered idle for shutting down, I would expect several conditions to exist simultaneously:

  • No one is logged into the machine
  • Every process running is owned by a system account
  • No containers or applications are running
  • The CPU load has been very low for an extended period of time

The first three can be checked with a few lines of shell script, but measuring system performance over time usually needs a tool designed for storing time series data. In this demo, I'll be using Prometheus for collecting data and generating alerts.

Using a CI/CD pipeline

From the perspective of the operating system, powering off an instance only requires root to enter "shutdown". Or at the platform level, an AWS administrator need only click through the web console to shut it down, use the AWS command line tools, or run a script. Unfortunately there are issues with both approaches.

The first is that the process is manual. This works for a few instances, but if you're scaling to hundreds or thousands, you have to have some way to automate things. The second issue is that it's not immediately obvious who or what triggered the shutdown and any output generated is likely lost. These methods are also often used with an administrator account of some kind which can introduce security risks.

Using a CI/CD pipeline solves these problems. I'll be using GitLab, but any tool supporting continuous deployment would suffice.

Configuring permissions

Because we're operating in the AWS cloud, it's relatively easy to implement the principle of least privilege. Instead of using a single administrator with unlimited permissions, we can configure an IAM user account that has only the permissions necessary. Not only is this a security best practice, it also prevents accidentally shutting down the wrong machines.

Limiting the permissions scope requires configuring a set of permissions (an AWS policy), creating an admin user function for shutting down the instance (an AWS role), and then assigning our regular use to this role as a trusted entity.

Digging into the configuration

In the next post, I'll go through a sample configuration that can be used as a generalized template.

Find Me

 

Boston, MA

erik "at" erikpatton.com

About Me

 

I'm an old school sysadmin turned Cloud/DevOps/Platform/SRE/whatever still doing systems work since before the dot com crash. Back then I lived in Silicon Valley but am now in Boston.

I've written a lot of code, mostly in Python and Perl (old school remember), although I'm finding Javascript/Node on Astro.js a fun diversion. I'll never admit to knowing how to debug C pointers for fear of someone asking me to do it.

Currently I'm working on a hybrid AWS Kubernetes cluster using OpenShift. Fun with IAM policies, persistent storage, and the like. Underneath it all though is still a Linux OS that even non-gray beards like me can still tinker with.

I do a bit of consulting and contract work, so if you need help with a project please don't be shy about reaching out.