Photo by Ian Taylor on Unsplash

DevOps with Terraform

Jonathan Reeves
14 min readOct 19, 2021

--

Hello, in this article I want to discuss the use of IaC or Infrastructure as Code using Terraform. For those of you already in the DevOps scene you might already be using or have heard of Terraform. Terraform is another tool that is similar to Ansible, Chef or Puppet. These tools allow you, the developer, to write and execute code to define, deploy, update and destroy your infrastructure. This has created a different mindset in which all aspects of operations are thought of as software — even those aspects that represent hardware(setting up physical servers for instance). You can manage almost everything in code, including servers, databases, networks, log files, application configuration, documentation, automated tests, deployment processes, etc.

What Is Terraform?

Terraform is an open source tool created by HashiCorp and written in the Go programming language. The code compiles down into a single binary(one binary for each of the supported operating systems to be more precise) which is named, unsurprisingly, terraform. This binary is then used to deploy infrastructure from your computer or a build server without you needing to run any extra infrastructure to make that happen. This is made possible because under the hood, the binary makes API calls on your behalf to one or more providers. The providers can be: AWS, Azure, Google Cloud, DigitalOcean, OpenStack, etc. Terraform gets to leverage the infrastructure those providers are already running for their API servers, as well as the authentication mechanisms you’re already using with those providers.

Terraform knows what API calls to make because you create the configurations. They are text files that specify what infrastructure you want to create. They make up the “code” in “infrastructure as code”. An example Terraform config:

resource "aws_instance" "ex1" {
ami = "ami-0a66c260acgbgh2g9"
instance_type = "t2.micro"
}
resource "google_dns_record_set" "g" {
name = "ex1.google-ex1.com"
managed_zone = "ex1-zone"
type = "A"
ttl = 300
rrdatas = [aws_instance.exa1.public_ip]
}

This example is pretty straight forward, if you’ve used something like a Dockerfile or set up a YAML file for Kubernetes this should look pretty familiar. If you haven’t you should still be able to loosely figure out what’s going on just by reading it. However if you are completely lost this snippet basically tells Terraform to make API calls to AWS to deploy a server. Make another API call to Google Cloud to create a DNS entry pointing to the AWS server’s IP address. Terraform, with it’s simple syntax, allows you to deploy interconnected resources across multiple cloud providers with ease.

How to Use the Config Files?

Let’s say you have defined your entire infrastructure — servers, databases, load balancers, topology of your network, etc — into Terraform config files. We then commit those files to version control like Gitlab or GitHub. You can then run Terraform commands like terraform apply, which deploys that to the infrastructure. The binary parses your code, translates it into several API calls to the cloud providers specified in the code, and makes those API calls as efficiently as possible.

Now let’s say we have all of that up and running but another developer needs to update the infrastructure. Instead of updating everything manually on the servers, they would instead:

  1. Pull down the Terraform config files.
  2. Make their changes directly in them.
  3. validate the changes through using automated tests and code reviews.
  4. Commit the updated code to version control.
  5. Rerun the terraform apply command and have Terraform make the necessary API calls to deploy the changes.

Does This Mean that Transparent Portability Exists?

Not quite. Because each cloud provider offers varying degrees of servers, load balancers, and databases the infrastructures available are different. Where one provider could provide one thing another might not even have that option available.

What Terraform’s approach allows you to do is, write code that is specific to each provider, taking advantage of that provider’s unique functionality. Using the same language, toolset, and IaC practices under the hood for all providers. So while you won’t be copying and pasting then modifying a small amount of code, you can still use similar concepts when connecting and setting up different infrastructures on the different providers than used before.

How Terraform Compares

Infrastructure as code is great. The process of choosing the right IaC tool, well not so much. A good portion of the tools overlap in what they do, are mostly open source and have commercial support. If you haven’t used each one yourself though trying to determine which one meets your needs can be daunting. In this section I want to give some insight into the other IaC tools on the market to hopefully help you determine which one will be best for you to use.

The trade-offs to consider that I have found most helpful are:

  • Configuration management vs provisioning
  • Mutable infrastructure vs immutable infrastructure
  • Procedural language vs declarative language
  • Master vs masterless
  • Agent vs agentless
  • Mature vs cutting-edge
  • Using multiple tools together

Configuration Management vs Provisioning

Chef, Puppet and Ansible all fall under the configuration management side of things. Terraform is a provisioning tool. The distinction is not so cut and dry of course given that configuration management tools can typically do some degree of provisioning(i.e., you can deploy a server with Ansible) and provisioning tools can do some degree of configuration(i.e., you can run configuration scripts on each server you provision with Terraform) you will most likely want to pick the tool that best fits your use case.

If for example you are using Docker the majority of your configuration management needs are already taken care of. Once the image is created from the Dockerfile all that’s left to do is provision the infrastructure for running those images. A provisioning tool will be your best bet in this scenario. A tool like Terraform perhaps.

Now if you’re not using a tool like Docker then as an alternative you could use a configuration management and provisioning tool together. For example Terraform to provision the servers and Ansible to configure each one.

Mutable Vs Immutable Infrastructure

Configuration management tools such as Ansible, Chef and Puppet typically default to a mutable infrastructure paradigm. For instance, if you have Chef install a new version of OpenSSl the software will run the update on your existing servers and the changes will happen in place. Over time, as you apply more and more updates the history of that server builds up the changes. As a result, each server becomes slightly different than all the others, leading to small configuration bugs that can be difficult to diagnose and reproduce. Even if you’re using automated tests these bugs are difficult to catch. A config management change might behave normally or as expected on a test server, but that same change could have varying results on the production server because the production server has accumulated changes that aren’t reflected on the test server.

If using Terraform to deploy machine images created by Docker most “changes” are actually deployments of a completely new server. To use our OpenSSL example again. We deploy a new version of OpenSSL, we use Docker to create a new image with the new version of OpenSSL. Deploy that image across a set of new servers, and then terminate the old servers. Every deployment uses immutable images on fresh servers, this approach will reduce the likelihood of config drift bugs, making it easier to know exactly what software is running on each server which allows you to easily deploy any previous version of the software at any time.

Now you can force configuration management tools to do immutable deployments if you want. However, this isn’t the idiomatic approach for those tools. It’s a natural way to use provisioning tools. It’s also worth mentioning that the immutable approach has downsides iteslf. Rebuilding an image and redeploying all of your servers for trivial change can take a long time. Immutability lasts only until you actually run the image. After a server is up and running, it will begin making changes on the hard drive and experiencing some degree of config drift.

Procedural Vs Declarative Language

Ansible and Chef encourage a procedural style in which you write code that specifies, step by step, how to achieve some desired end state. Terraform encourages a more declarative style in which you write code that specifies your desired end state, and the IaC tool itself is responsible for figuring out how to achieve that state.

To see an example of the two we can first see procedural:

- ec2:
count: 15
image: ami-1a77c270acgcgh2g9
instance_type: t2.micro

And here is Terraforms approach using declarative style:

resource "aws_instance" "ex1" {
count = 15
ami = "ami-1a77c270acgcgh2g9"
instance_type = "t2.micro"
}

Looking at these two approaches from a birds eye view they might look similar. When you execute them with Ansible or Terraform you will get similar results. The fascinating thing is what happens when you need or want to make a change. For instance, let’s say traffic has gone up and you want to increase the number of servers to 20. With Ansible, the procedural code example is no longer useful. If you just updated the number to 10 and rern the code it would deploy 25 new servers giving you a total of 30. So instead, you would need to be aware of what is already deployed and write a totally new procedural script to add the five new servers:

- ec2: 
count: 5
image: ami-1a77c270acgcgh2g9
instance_type: t2.micro

Now with the Terraform, declarative example, all you do is declare the end state that you want which Terraform then figures out how to get to that end state, Terraform will also be aware of any state it created in the past. Which means to deploy the five more servers all you need to do is go back to the same config file and update the count from 15 to 20:

resource "aws_instance" "ex1" {
count = 20
ami = "ami-1a77c270acgcgh2g9"
instance_type = "t2.micro"
}

If this was applied, Terraform would realize it had already created 15 servers and would know that all it needs to do is create five new servers. Actually you could use Terraform’s plan command to preview what changes it would make:

$ terraform plan# aws_instance.ex1[10] will be created
+ resource "aws_instance" "ex1" {
+ ami = "ami-1a77c270acgcgh2g9"
+ instance_type = "t2.micro"
+ (...)
}
# aws_instance.ex1[10] will be created
+ resource "aws_instance" "ex1" {
+ ami = "ami-1a77c270acgcgh2g9"
+ instance_type = "t2.micro"
+ (...)
}
... // more examples but I think the point is made :)
Plan: 5 to add, 0 to change, 0 to destroy.

Now I know you’re thinking, “Ok well this is great but what happens if I have a new AMI ID that I need to use to update the app. How do I do that?” Well with the procedural approach, both of your previous Ansible templates are again not useful, so you need to write yet another template to track down the 15 servers you deployed previously (or wait 20?) and carefully update each one to the new version. With the declarative approach of Terraform, you just go back to your same config file and simply update the ami id to the new id.

resource "aws_instance" "ex1" {
count = 20
ami = "ami-23abaa913f1468cb"
instance_type = "t2.micro"
}

Now Ansible does allow you to use tags to search for existing EC2 instances before deploying new ones(use instance_tags and count_tagi parameters), however having to manually figure out this sort of logic for every single resource you manage with Ansible, based on each resource’s past history, can be surprisingly complicated — finding existing instances not only by tag, but also image version.

Master Vs Masterless

Chef and Puppet require that you run a master server for storing the state of your infrastructure and distributing updates. Every time you want to update something in your infrastructure, you use a CLI tool to issue new commands to the master server. From there the master server either pushes the updates out to the other servers or the other servers then pull the latest updates down from the master server on a regular basis.

A master server offers a few advantages. It’s a single, central place where you can see and manage the status of your infrastructure. Many config management tools even provide a nice web interface to make it easier to see what’s going on. Second, some master servers can run continuously in the background, and enforce your config. This way, if someone makes a manual change on a server, the master server can revert that change to prevent config drift.

But like all things there are drawback as well. Extra infrastructure means you need to deploy an extra server, or even a cluster of extra servers(for high availability and scalability), just to run the master. Maintenance is needed in order to maintain, upgrade, back, monitor and scale the master server(s). Security to provide a way for the client to communicate to the master servers) and a way for the master server(s) to communicate with all the other servers. This typically means opening extra ports and configuring extra authentication systems. Which increases your surface area to attackers.

Chef and Puppet do have support for masterless modes where you run just their agent software on each of your servers, typically on a periodic schedule(think cron jobs) and use that to pull down the latest updates from version control. This can significantly reduce the number of moving parts, however this still leaves a number of unanswered questions. Especially about how to 0provision the servers and install the agent software on them in the first place.

Terraform and Ansible are masterless by default. Well to be more accurate, some of them might rely on a master server, but it’s already part of the infrastructure you’re using and not an extra piece that you need to manage. Terraform, for instance, communicates with cloud providers usin ghte cloud provider’s APIs. The API servers are master servers except that they don’t require any extra infrastructure or any extra authentication mechanisms(i.e., just use your API keys). Ansible works by connecting directly to each server over SSH. You don’t need to run any extra infrastructure or manage extra auth mechanisms.

Agent Vs Agentless

Chef and Puppet requires an install of their agent software on each server that you want to configure. The agent typically runs in the background on each server and is responsible for installing the latest configuration management updates. This has a few drawbacks:

Bootstraping: How can you provision your servers and install the agent sofware on them in the first place? Some config management tools kick the can down the road, assuming that some external process will take care of this for them(i.e., use Terraform to deploy a bunch of servers with an AMI that has the agent already installed); other config management tools have a special bootstrapping process in which you run one-off commands to provision the servers using the cloud provider APIs and install the agent software on those servers over SSH.

Maintenance: You need to carefully update the agent software on a periodic basis. Be careful to keep it synchronized with the master server if there is one. Monitoring the agent becomes crucial as you will need to restart it if it crashes.

Security: If the agent software pulls down configuration from a master server (or some other server if no master one is present), you need to open outbound ports on every server. If the master server pushes config to the agent, you need to open inbound ports on every server. In either case you must figure out how to authenticate the agent to the server for which it is trying to communicate with. This increases the surface area to attackers.

Again Chef and Puppet have levels of support for agentless modes, but these feel like afterthoughts and don’t support the full feature set of the configuration management tool.

Ansible and Terrform do not require you to install any extra agents. To be more accurate some of them require agents but these are typically already installed as part of the infrastructure you’re using. AWS, Azure, Google Cloud and all ofthe other cloud providers take care of installing, managing and authenticating agent software on each of their physical servers. As a Terraform user you don’t need to worry about any of that. Just issue commands and the cloud provider’s agents execute them for you on all of your servers.

Mature Vs Cutting Edge

Another key factor, which I feel isn’t quite considered as much as it should be, is maturity. Terraform is the youngest IaC tool currently. It’s still pre-1.0.0, so there is no guarantee of a stable or backward compatible API. Buts are relatively common, though most of them are minor. This is Terraform’s biggest weakness. It has become extremely popular in a short time, the rice you pay for using this new tool is that it is not as mature as some of the other IaC tools out there.

Using Multiple Tools Together

The previous sections were all about comparing the tools against each other. However you can use multiple tools to build your infrastructure. Each of these tools has strengths and weaknesses. It’s your job to pick the right tool for the right job. A common team-up if you will is Terraform with Ansible. Terraform will be used to deploy all the underlying infrastructure, including the network topology, data stores, load balancers and servers. Then use Ansible to deploy your apps on top of those servers.

This can be an easy approach to get started with. There is no extra infrastructure to run and there are many ways to get Ansible and Terraform to work together. The huge downside is that using Ansible typically means that you’re writing a lot of procedural code, with mutable servers, so as your codebase, infrastructure and team grow, maintenance can become more difficult.

Terraform and Packer is another good team-up of apps. You use Packer to package your apps as VM images. You then use Terraform to deploy a) servers with the images of the VM and b) the rest of your infrastructure, including the network topology , data stores, and load balancers.

This can also be an easy approach because there is no extra infrastructure to run. This is an immutable infrastructure approach, which will make maintenance easier. However, there are two pain points. VMs can take a long time to build and deply, which will slow down your iteration speed. Second the deployment strategies you can implement with Terraform are limited (i.e., you can’t implement blue-green deployment natively in Terraform), so you either end up writing lots of complicated deployment scripts, or you turn to orchestration tools like Kubernetes.

One other possible team-up is Terraform, Packer, Docker, and Kubernetes. You can use Packer to create a VM image which has Docker and Kubernetes installed. Then you use Terraform to deploy a) a cluster of servers, each of which runs this VM image, and b) the rest of your infrastructure, including the network toplogy, data stores, and load balancers. Finally when the cluster of servers boots up it forms a Kubernetes cluster that you use to run and manage your Dockerized apps.

This approach has the advantage that Docker images build fairly quickly. You can run and test them on your local computer and take advantage of all of the built-in functionality with Kubernetes. This includes various deployment strategies, auto healing, auto scaling and so on. The drawback is the added complexity. Both in terms of extra infrastructure to run (Kubernetes clusters are difficult and expensive to deploy and operate. Most major cloud providers no provide managed Kubernetes services, which can offload some of this work.

Conclusion

I know this is a pretty long article. I hope that I was able to enlighten anyone looking into the best tools to handle infrastructure as code. I personally use Terraform but that doesn’t mean it’s the right choice for you particular job(s). I promise it won’t hurt my feelings if you end up choosing to use Ansible instead. I just wanted to get the word out there about Terraform in the hopes that more and more people will pick up using the tool so we, as a community, can make it the best tool it can be.

--

--

Jonathan Reeves

I am a software engineer that is currently trying to break into the DevOps world using Python. Professionally I use JavaScript with React to build websites.