1. Infrastructure as Code
Infrastructure as Code (IaC) has taken the world by storm. If you’re not using it, you’d better start.
Why? Well, traditionally, to create or configure resources in the cloud, you would use the console or command line. Each one would need to be done manually, through a series of steps that needed to be done in the right order. This workflow is unsuitable for real development projects:
- It requires manual steps - often many - and every manual step is an opportunity for introducing errors.
- It requires the same manual steps to be made for every environment. Because of errors, we have no assurance that our development, QA, and production environments look the same.
- If there is a disaster and the production environment needs to be rebuilt in another region, all the same manual steps need to be performed. Only this time, it’s an emergency because production is down!
These user interface based mechanisms are great for debugging, experimentation, or viewing resources. But for creating and managing cloud resources, Infrastructure as Code (IaC) is the way to go. There are many different IaC tools, including CloudFormation , Terraform , CDK, SAM , and ARM templates. We're not going to compare them right now, but safe to say, there are a lot of options.
But why stop there?
2. (Infrastructure as Code) as Code
What does THAT mean?
Let’s take a step back from IaC. Does it matter how we write any kind of code? Let’s think about code we might write for another purpose. As you might guess from my user name, I’m a big fan of Frank Sinatra. I have many pictures of Mr. Sinatra in my basement.
So let’s say I'm writing code to keep track of all my Sinatra pictures and albums: a Sinatra tracker. (SinaTracker! You read it here first. Ahem.)
Does it matter how I write my code? Does it matter if I just keep it in a single local file on my laptop hard drive, making copies and emailing them to others when they want to run my program too, and hope I don’t ever make any mistakes when adding new features? Years ago, this was how we would write our code. But today, we keep it in git, modularize the code, write unit tests, and so forth - in short, apply modern software development practices.
Yet, we’ve all seen Infrastructure as Code handled in ways little better than that single file implementation of SinaTracker. Let's say we want to automate provisioning EC2 instances. So we bang out a shell script with CLI commands on our local machine. Or maybe we have seen the light and write more sophisticated desired state scripts using tools like CloudFormation or Terraform.
But we can do better. We can apply the same modern software development practices to our IaC code - really treat Infrastructure as Code as Code. Let’s look at some ways to do that.
a. Version control - git
Use of version control for "real" code is uncontroversial, and today git should be the default option. So, it should go without saying that IaC should always be managed in a git repo. You wouldn't leave your "real" code sitting in a "scripts" directory on your local machine or distributed as an attachment to a page in your company's wiki. Your IaC code should be no different.
Unfortunately, it’s not always easy to use git. As the well-known xkcd comic implies, most of us get by knowing a handful of git commands and “phone a friend” when we need to do something beyond d our limited knowledge.
I recommend we all go beyond Cueball’s level of knowledge and then help others to do the same. There are good free books and video courses available to raise our knowledge of git. Unfortunately, my experience mirrors the comic's hover text, in that we can tend to get enamored by git’s “clever” design and terminology (see commit-ish and tree-ish ). Let’s not be that guy. Let's from these good free sources and then pass on the love in a way that really helps others.
But let's go beyond just using git. Since git is going to be tracking the progress of our code, we want git to be the single source of truth. This means everything - all configuration files, all documentation, all environment definitions, even the operations themselves - is stored in a git repo, and preferably the same one. Everything is right there, and versioned together: you don't have to pull together pieces from a number of places, and you always know which version of configuration went with which version of code. This will come into play more later when we talk about Continuous Delivery.
b. Branching and Developer Workflow
Even if IaC code is using git, often it's always just checked into the master branch. This is fine, if you make the conscious decision to use Trunk-Based Development and have the tools and practices in place to make best use of that strategy. But unfortunately this is just the default in a lot of IaC projects, with people not doing code reviews and the other practices that are considered standard operating procedure for "real" code.
IaC projects should decide what their branching strategy is, e.g. Trunk-Based or GitHub Flow and then use the same processes used with that strategy for "real" code. This becomes even more important if you decide to use a GitOps Continuous Delivery tool such as Atlantis , since they primarily operate on Merge or Pull Requests. The Merge/Pull Request thus becomes the mechanism to introduce changes into the master branch in a controlled way.
Speaking of control, don't forget to consider code reviews. With "real" code, there should always be a mechanism set up to do code reviews, whether synchronous with the developers workflow (usually through the Merge/Pull Request in GitHub Flow or similar) or asynchronous after the fact (often used with Trunk-Based Development). I've seen large IaC projects, with thousands of lines of IaC code written, that were essentially a single developer just checking into master. It's true, this isn't code that operates production workflows, but it probably creates or managing production infrastructure. Shouldn't we try to make sure it works right?
c. Modularity
Modularity is, of course, common with "real" software. You would never create a single source file that contains all your code in one long linear structure. Rather, you divide your code into modules, using tools provided by the language such as classes, and allow reuse by instantiating and composing objects.
Yet it's not uncommon to use one huge YAML or JSON source file that contains all the code to define all our Infrastructure. If there is any modularity, it's extremely simple, dividing the code into a set of source files that are all included through something like a server-side include. Over the past few years, Terragrunt has been used to "keep your Terraform templates DRY", and in fact Terraform itself has been improving in its ability to modularize, to the place where Terragrunt may not longer be needed. (Given that Terragrunt introduces another somewhat opaque layer, I welcome this change.) These are better, but can we do better than this?
One of the most exciting IaC tools to come out recently is the Cloud Development Kit or CDK and its Terraform counterpart, CDK-TF. These tools let you use a "real" programming language like TypeScript or Python to define your Infrastructure. My own experience is that the CDK version is generally shorter and easier to understand.
But the difference is not merely syntactic. Once you're using a real programming language, you can easily abstract, modularize, and compose your IaC code using native programming language tools such as classes and member variables. The next time you have to make a single similar change that involves fifty different resources, you'll be thankful you only have to make it once. And using a real programming language, you are not restricted to the specific modularization capabilities the tool developer decided to add. You have the full power of a general-purpose programming language at your disposal. Finally, IDEs work with them out of the box using their normal tools. Yes, there are third-party extensions for Visual Studio Code to help edit these DSLs: but why not use native VSC capabilities like search, hover, and control-click to explore the SDK?
d. Continuous Integration - Linting and Unit Testing
Once we have our code in git and modularized, we can start doing the same kinds of tasks in Continuous Integration that we do with "real" code:
Writing unit tests and executing them (preferably before we build real resources from the code). There are tools like Terratest that let you write tests against your IaC, but again CDK and CDK-TF come to your rescue. If we define our Infrastructure using TypeScript or Python, we can easily write unit tests against it using the normal unit testing tools for those programming languages.
Doing linting and code quality checks. While unit testing finds actual errors in the code, linting finds potential errors: things that don't break now, but which experience shows might in the future. You can use special-purpose tools like tflint or cfn-lint or general-purpose code linting tools.
You can run these tests manually, but if you're in git and creating Merge/Pull Requests anyway, why not build a pipeline like you do with "real" code, and configure your git repository to block merging the request if the pipeline fails?
e. Continuous Delivery - GitOps
Finally, just as with "real" code, the holy grail is Continuous Delivery or Deployment. Use git as the source of truth, not only for the code itself, but also for how and when it is deployed. Sometimes this process is called GitOps, and typically it operates through Merge/Pull Requests. Typically, an external tool watches your git repo for Merge/Pull Requests, analyzes the changes being proposed, and then gives feedback to the developer in the form of comments added to the Merge/Pull Request via API.
There are many implementations of GitOps, including Argo CD, Flux CD, and Atlantis. They apply to different toolsets, typically support at least GitHub, GitLab, and BitBucket, and have different strengths and weaknesses. Check them out and pick what works for you.
3. Conclusion
Hopefully by now, everybody agrees we should be writing our Infrastructure as Code. Let's apply the lessons we've learned in Software Development for the past 40 years, and write Infrastructure as Code... as Code!