Creating and Updating infrastructure from your Continous Deployment/Integration pipeline on GitLab…

Creating and Updating Infrastructure using Terraform from your Continuous Deployment/Integration Pipeline on GitLab CI

I’ve written about things you can do in your CI/CD and just a little about infrastructure as code before. The thing is, I love infrastructure as code. The thought of having everything in one place and that the project is self-contained and that team is self-sufficient they are owners of the entire thing is just great. Yet even in that scenario, I was not happy on how we were handling updating our infrastructure. In this article I will share why.

Infrastructure as code

Consider this scenario. At first, as a single developer you started deploying your applications from your local workstation. Then the project grows to have multiple developers and now all your teammates can do that as well. Then someone does it from the wrong branch. Nightmare. If this was a podcast there would be a scream here. Now, imagine that can happen to your infrastructure. Someone made a mistake and removed a database or recreated a cluster or dropped an S3 Bucket. That’s an entire horror show. Cue terror music.

All the main reasons why we apply CI/CD for our applications are more than valid for our infrastructure.

The code is shared. No matter if it's an application or infrastructure, you are probably going to have more than one person adding and modifying stuff so the code should be kept updated on the repo. You should not allow updates to your infrastructure be performed from a local workstation because even with a shared remote state things can go wrong. This was happening to us, and we wanted to remove that fear of messing up infrastructure from any machine and make it a safer and streamlined process.

The solution — CI/CD

Let's make our CI/CD pipeline perform the infrastructure changes for us, we are only doing commands anyway. Of course, a CI/CD pipeline can perform whatever the commands are for whatever the tool it is you are using. Update stack or terraform apply or whatever, but there are a couple of concerns:

How do you make this a safe solution?

GitLab-CI pipelines work using tag-based runners. These are instances that perform the docker on docker instructions as per your configuration. You could share 1 instance for pipelines on multiple projects. That's pretty cool because you don't have to create so much stuff as you would on AWS CodePipeline, but there is the possibility that you could end up with one instance with permissions to create the horror show on your account.

In our scenario, we wanted to try and use assume-role for each project. That way our runners’ instances would have as limited permissions as possible, and they would assume roles by project. This has the advantage of eliminating bottlenecks since each dev team can maintain their unique project pipeline roles. That way, teams have all the permissions needed to perform the infrastructure changes for their projects only.

The challenge here was that we were used to seeing assume-roles from services-to-roles, but not from roleA-to-roleB. With roleA as the runner role and roleB the project role, a trusted relationship could be established mapping runners to projects. It’s kind of straightforward but I spent some time figuring that out.

Also, since assume-role gives you temporary credentials, you need to do some tricks to use it:

You should change the environment selection to be dynamic and add the plan as an artifact so you can use it on the apply.

The other dangerous part of this assignment is that we make mistakes. Everyone does, so how do you minimize that? On a pipeline like this a commit means a deployment. That is kind of scary, because your app could have automated rollbacks to previous versions. With infra, even though you can be prepared is not that easy.

So we used Manual actions. These are basically steps on the pipeline that sit there until someone with merge permissions hits a play button. With this setup, you can make sure someone reviews the infrastructure changes before they are applied. Yes that can happen on code reviews, but those happen on different branches, with this approach you are actually reviewing what will get applied.