Also in this series:

Terragrunt Example

Let’s consider the situation where we want to maintain the infrastructure for a system with two major components: an API and a database solution. You must also deploy dev, test, stage, and production environments for this system. Dev and Test environments are deployed to one region, while stage and production environments are deployed to two regions.

We’ve created a preconfigured sample repository to demonstrate how we might handle something like this with Terragrunt. Now, although the requirements and scenario described above may not pertain to you, the preconfigured sample repository should give you a good idea of what you can accomplish with Terragrunt and the benefits it provides in the context of keeping your Terraform code organized. Also, keep in mind that Terragrunt is unopinionated and allows you to configure it in several ways to accomplish similar results; we will only cover a few of the benefits Terragrunt provides, but be sure to check out their documentation site for more information.

To get the most out of the code sample, you should have the following:

Run through the setup steps if you need to. This will involve running a mini terraform project to provision a few resource groups in addition to a storage account to store your terraform state files.
The sample repo contains several top-level directories:

  • /_base_modules
  • /bootstrap
  • /dev
  • /test
  • /stage
  • /prod
  • _base_modules folder – contains the top-level terraform modules that your application will use. There are subfolders for each application type, the API, and the storage solution (/api and /sql). For example, there is a subfolder for the API, which contains the terraform code for your API application, and one for SQL, which will include the terraform code for your storage/database solution; take note of the main.tf, variables.tf, and outputs.tf files in each subfolder. Each application type folder will also contain a .hcl file that contains global configuration values for all environments that consume that respective application type
  • [dev/test/stage/prod] – environment folders that contain subfolders for each application type. Each subfolder for each application type will contain Terragrunt configuration files that contain variables and inputs specific to that environment
  • Bootstrap – a small isolated terraform project that will spin up placeholder resource groups in addition to a storage account that can be used to maintain remote terraform state files

As mentioned above, there are several .hcl files in a few different places within this folder structure. These are Terragrunt configuration files. You will see one within each sub folder inside the _base_modules directory and one in every subfolder within each environment folder. These files are how Terragrunt knows what terraform commands to use, where to store each application’s remote state, and what variable files and input values to use for your terraform modules defined in the _base_modules directory. Read more about how this file is structured on Gruntwork’s website. With this sample repository, global configurations are maintained in the /_base_modules folder and consumed by configurations in the environment folders.

Let’s go over some of the basic features that Terragrunt offers.

Keeping your Remote State Configuration DRY

I immediately noticed when writing my first bits of Terraform code that I couldn’t use variables, expressions, or functions within the terraform configuration block. You can override specific parts of this configuration through the command line, but there was no way to do this from code.

Terragrunt allows you to keep your backend and remote state configuration DRY by allowing you to share the code for backend configuration across multiple environments. Look at the /_base_modules/global.hcl file in conjunction with the /dev/Terragrunt.hcl file.

/_base_modules/global.hcl:

remote_state {
  backend = "azurerm"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    resource_group_name  = "shared"
    storage_account_name = "4a16aa0287e60d48tf"
    container_name       = "example"
    key            = "example/${path_relative_to_include()}.tfstate"
  }
}

This file defines the remote state that will be used for all environments that utilize the api module. Take special note of the ${path_relative_to_include} expression – more on this later.
A remote state Terragrunt block that looks like this:

remote_state {
    backend = "s3"
    config = {
      bucket = "mybucket"
      key    = "path/for/my/key"
      region = "us-east-1"
    }
  }

Is equivalent to terraform block that looks like this:

  terraform {
    backend "s3" {
      bucket = "mybucket"
      key    = "path/to/my/key"
      region = "us-east-1"
    }
  }

To inherit this configuration into a child sub folder or environment folder you can do this:

/dev/api/terragrunt.hcl

include "global" {
  path = "${get_terragrunt_dir()}/../../_base_modules/global.hcl"
  expose = true
  merge_strategy = "deep"
}

The included statement above tells Terragrunt to merge the configuration file found at _base_modules/global.hcl with its local configuration. The ${path_relative_to_include} in the global.hcl file is a predefined variable that will return the relative path of the calling .hcl file, in this case,/dev/api/terragrunt.hcl. Therefore, the resulting state file for this module would be in the example container at dev/api.tfstate. For the sql application in the dev environment, the resulting state file would be dev/sql.tfstate; look at the _base_modules/sql/sql.hcl file. For the api application in the test environment, the resulting state file would be, test/api.tfstate. Be sure to check out all of the built-in functions Terragrunt offers out of the box.

Using the feature just mentioned, we only define the details of the remote state once, allowing us to cut down on code repetition. Read more about the remote_state and include blocks and how you can configure them by visiting the Terragrunt documentation. Pay special attention to merge strategy options, how you can override includes in child modules, and the specific limitations of configuration inheritance in Terragrunt.

Keeping your Terraform Configuration DRY

Merging of configuration files do not only apply to remote state configurations – you can also apply them to the sources and inputs of your modules.
In Terragrunt, you can define the source of your module (main.tf or top level terraform module) within the terraform block. Let’s consider the api application:

/_base_modules/api/api.hcl

terraform {
  source = "${get_terragrunt_dir()}/../../_base_modules/api"

  extra_arguments "common_vars" {
    commands = get_terraform_commands_that_need_vars()

    required_var_files = [
      
    ]
  }
}

You’ll notice this is referencing a local path; alternatively, you can also set this to use a module from a remote git repo or terraform registry.
The api.hcl configuration is then imported as a configuration into each environment folder for the api application type:

Ex. /dev/api/terragrunt.hcl

include "env" {
  path = "${get_terragrunt_dir()}/../../_base_modules/api/api.hcl"
  expose = true
  merge_strategy = "deep"
}

Include statements with specific merge strategies can also be overwritten by configurations in child modules, allowing you to configure each environment separately if needed.
Merging inputs before they are applied to your terraform module is also extremely helpful if you need to share variables across environments. For example, all the names of your resources in your project might be prefixed with a specific character set. You can define any global inputs in the inputs section of the _base_modules/global.hcl file. Because Terragrunt configuration files are written in the HCL language, you can also utilize all the expressions and functions you use in Terraform to modify or restructure input values before they are applied. Look at how we are defining the identifier input variable found in both sql and api modules:

Here is the terraform variable:

/_base_modules/api/variables.tf and /_base_modules/sql/variables.tf

variable "identifier" {
  type = object({
      primary = string
      secondary = string
      type = string
  })
}

Here is the primary property being assigned from the global env:

/_base_modules/global.hcl

... 
inputs = {
    identifier = {
        primary = "EXAMPLE"
    }
}
...

Here is the secondary property being assigned from the dev/dev.hcl file:

/dev/dev.hcl

inputs = {
 identifier = {
     secondary = "DEV"
 }
}

And here is the type property being applied in the module folders

/_base_modules/sql/sql.env.tf

...
inputs = {
    identifier = {
        type = "SQL"
    }
}

/_base_modules/api/api.hcl

...
inputs = {
    identifier = {
        type = "API"
    }
}

All configurations are included in the environment configuration files with:

include "global" {
  path = "${get_terragrunt_dir()}/../../_base_modules/global.hcl"
  expose = true
  merge_strategy = "deep"
}

include "api" {
  path = "${get_terragrunt_dir()}/../../_base_modules/api/api.hcl"
  expose = true
  merge_strategy = "deep"
}

include "dev" {
  path = "../dev.hcl"
  expose = true
  merge_strategy = "deep"
}

would result in something like:

inputs = {
    identifier = {
      primary = "EXAMPLE"
secondary = "DEV"  
type = "API"
    }
}

We utilize this pattern to share variables across all environments and applications within a specific environment without having to declare them multiple times.
It is also important to note that because Terragrunt configuration files are written in the HCL language, you can access all of Terraform’s functions and expressions. As a result, because you can inherit Terragrunt configuration files into a specific environment, you can restructure, merge, or alter input variables before they are sent to terraform to be processed.

Running Multiple Modules at once

You can also run multiple terraform modules with one command using Terragrunt. For example, if you wanted to provision dev, test, stage, and prod with one command, you could run the following command in the root directory:

terragrunt run-all [init|plan|apply]

If you wanted to provision the infrastructure for a specific tier, you could run the same command inside an environment folder (dev, test, stage etc.).
This allows you to neatly organize your environments instead of maintaining everything in one state file or trying to remember what variable, backend, and provider configurations to pass in your CLI commands when you want to target a specific environment.

It is important to note that you can maintain dependencies between application types within an environment (between the sql and api application) and pass outputs from one application to another. Look at the dev/api environment configuration file.

/dev/api/terragrunt.hcl

dependency "sql" {
  config_path = "../sql"
    mock_outputs = {
    database_id = "temporary-dummy-id"
  }
}

locals {
}
inputs = {
    database_id = dependency.sql.outputs.database_id
    ...
}

Notice that it references the dev/sql environment as a dependency. The dev/sql environment uses the _base_modules/sql application so look at that module, specifically the outputs.tf file.

/_base_modules/sql/outputs.tf

output "database_id" {
  value = azurerm_mssql_database.test.id
}

Notice that this output is being referenced in the /dev/api/terragrunt.hcl file as a dependency.

The client requirements described earlier in this post proved to be especially difficult to maintain without the benefit of being able to configure separate modules that depend on one another. With the ability to isolate different components of each environment and share their code and dependencies across environments, we could maintain multiple environments effectively and efficiently with different configurations.

Conclusion

Terraform as an infrastructure as code tool has helped us reliably develop, maintain, and scale our infrastructure demands. However, because our client work involved maintaining multiple environments and projects simultaneously, we needed specific declarative design patterns to organize our infrastructure development. Terragrunt offered us a simple way to develop multiple environments and components of a given application in a way that was repeatable and distributable to other project pipelines.

There are several features of Terragrunt we did not discuss in this post:
Before, After, and Error Hooks
Maintaining CLI flags

We would like to see some of the functionality Terragrunt offers baked into Terraform by default. However, we do not feel like Terragrunt is a final solution; Terraform is rather unopinionated and less concerned with how you set up your project structure, while Terragrunt is only slightly more opinionated in your setup. Terragrunt claims to be DRY, but there is still a lot of code duplication involved when creating multiple environments or trying to duplicate infrastructure across regions. For example, creating the folder structure for an environment is cumbersome, especially when you want to add another tier.

Part one of this series focused on several methods and patterns to make the most out of your terraform code, explicitly focusing on keeping your terraform code and configurations organized and easy to maintain.

Utilizing registries to modularize your infrastructure is only a small part of the improvements you can make to your Terraform code.

One of the significant concepts of Terraform is how it tracks the state of your infrastructure with a state file. In Terraform, you need to define a “remote state” for each grouping of infrastructure you are trying to deploy/provision. This remote state could be stored in an S3 bucket, Azure Storage account, Terraform Cloud, or another applicable service. This is how Terraform knows where to track the state of your infrastructure to determine if any changes need to be applied or if the configuration has drifted away from the baseline defined in your source code. The organization of your state files is essential, especially if you are managing it yourself and not using Terraform Cloud to perform your infrastructure runs.

Keep in mind that as your development teams and applications grow, you will frequently need to manage multiple developments, testing, quality assurance, and production environments. Several projects/components will need to be managed simultaneously, as well as the number of configurations, variable files, CLI arguments, and provider information will become untenable over time.

How do you maintain all this infrastructure reliably? Do you track all the applications for one tier in one state file? Or do you break them up and track them separately?

How do you efficiently share variables across environments and applications without defining them multiple times? How do you successfully apply these terraform projects in a continuous deployment pipeline that is consistent and repeatable for different types of Terraform projects?

At one of our current clients, we were involved with onboarding Terraform as an Infrastructure as Code (IaC) tool. However, we ran into many challenges when trying to deploy multiple tiers (dev, test, stage, production, etc.) across several workstreams, specifically in a continuous manner within a deployment pipeline.

BLOG POST
Learn how to automate a file transfer from FTP server to AWS S3 bucket using Terraform to provide you flexible means when downloading files from an FTP server.

The client I work for has the following requirements for the web UI portion of the services they offer (consider Azure Cloud provider for context):

  • Each tier has *six applications* for different areas of the United States
  • *Each application* has a web server, a redis cache, app service plan, a storage account, and a key vault access policy to access a central key store
  • Development and test tiers are deployed to a single region
  • Applications in development and test tier both share a redis cache
  • Applications in Staging and production environments have individual redis caches
  • Stage and production environments are deployed to two regions, east and central

Application Development

Stage and Production tiers have up to 48 resources, respectively; the diagram above only represents three applications and excludes some services. Our client also had several other IT services that needed similar architectural setups; most projects involved deploying six application instances (for each service area of the United States), each configured differently through application settings.

Initially, our team decided to use the Terraform CLI and track our state files using an Azure Storage Account. Within the application repository, we would store several backend.tf files alongside our terraform code for each tier and pass them dynamically to terraform init –backend-config=<PATH> when we want to initialize a specific environment. We also passed variable files to terraform [plan|apply|destroy] –var-file=<PATH> to combine common and tier-specific application setting template files. We adopted this process in our continuous deployment pipeline by ensuring the necessary authentication principals and terraform CLI packages were available on our deployment agents and then running the appropriate constructed terraform command in the appropriate directory.

This is great but presented a few problems when scaling our process. The process we used initially allowed developers to create their own terraform modules specific to their application, utilizing either local or shared modules in our private registry. One of the significant problems came when trying to apply these modules in a continuous integration pipeline. Each project had its unique terraform project and its configurations that our constant deployment pipeline needed to adhere to.

Naming Conventions

Let’s also consider something relatively simple, like the naming convention of all the resources in a particular project. Usually, you would want the same named prefix on your resources (or apply it as a resource tag) to visually identify what projects the resources belong to. Since we had to maintain multiple tiers of environments for these projects (dev, test, stage, prod), we wanted to share this variable across environments, only needing to declare it once. We also wanted to declare other variables and optionally override them in specific environments. With the Terraform CLI, there is no way to merge inputs and share variables across environments. In addition, you cannot use expressions, functions, or variables in the terraform remote state configuration blocks, forcing you to either hardcode your configuration or apply it dynamically through the CLI; see this issue.

We began to wonder if there was a better way to organize ourselves. This is where Terragrunt comes into play. Instead of keeping track of the various terraform commands, remote state configurations, variable files, and input parameters we needed to consolidate to provision our terraform projects, what if we had a declarative way of defining the configuration of our Terraform?

Terragrunt is a minimal wrapper around Terraform that allows you to dynamically assign and inherit remote state configurations, provider information, local variables, and module inputs for your terraform projects through a hierarchal folder structure with declarative configuration files. It also gives you a flexible and unopinionated way of consolidating everything Terraform does before it runs a command. Every option you pass to terraform can be specifically configured through a configuration file that inherits, merges, or overrides components from other higher-level configuration files.

Terragrunt allowed us to do these essential things:

  • Define a configuration file to tell what remote state file to save based on application/tier (using folder structure)
  • It allowed us to run multiple terraform projects at once with a single command
  • Pass outputs from one terraform project to another using a dependency chain.
  • Define a configuration file that tells terraform what application setting template file to apply to an application; we used .tpl files to
  • apply application settings to our Azure compute resources.
  • Define a configuration file that tells terraform what variable files to include in your terraform commands
  • It allowed us to merge-common input variables with tier-specific input variables with the desired precedence.
  • It allowed us to consistently name and create state files.
Defining your cloud computing infrastructure as code (IAC) is becoming an industry standard for enterprise IT teams to scale effectively as their development teams and applications grow.

Terraform, by Hashicorp, is an infrastructure as code tool that lets you define both cloud and on-prem resources using human-readable configuration files that you can version, reuse, and share across your various projects and development teams.

This post will focus on several methods and patterns to make the most out of your terraform code, explicitly focusing on keeping your terraform code and configurations organized and easy to maintain; we will strive to implement DRY principles wherever possible. As a result, this post assumes you’ve worked with Terraform before and have a general understanding of how to use it. If you want to know more about Terraform and what it can offer, look at Hashicorp’s website. Also, check out this video by Hashicorp’s cofounder for a short summary of what Terraform can offer.

Modularization & Terraform Registry

Terraform has a few different offerings that provide various features. The most basic of which is open source, the Terraform CLI. Although Terraform CLI is an excellent tool, it is significantly more helpful when best practices are implemented.

One of the first things to do is organize your terraform code. Break it apart into child modules or components that encapsulate smaller pieces of your infrastructure. Instead of having one module file that provisions all of your resources, break up your architecture into several components (i.e., AppService plan, AppService, storage account, Redis cache) and reference them as dependencies in your encapsulating module (ex. main.tf).

Hashicorp also gives you access to their public Terraform registry for various providers like AWS and Azure. The public registry already has re-usable modules for the simplest cloud resource blocks for you to extend and utilize.

Although these pre-defined modules exist to standardize resource naming conventions, cloud computing sizing/scaling, and other restrictions, you may want to impose on your development teams to create modules that utilize modules in the public Terraform Registry. Create them with distinctly defined validation parameters for input variables.

If you want to read more about creating Terraform modules, check out this blog post by the creators of Terragrunt, as well as Hashicorp’s documentation.

Now the question arises: where do I keep the code for my child modules to be easily distributable, re-usable, and maintainable? Creating these child modules and storing the code within the application code repository creates some disadvantages. What if you want to re-use these sub-modules for other repositories or different projects? You’d have to duplicate the child module code and track it in several places. That’s not ideal.

To address this, you should utilize a shared registry or storage location for your child modules to reference them from multiple repositories and even distribute different versions to different projects. This would involve moving each submodule to its individual repo to be maintained independently and then uploading it to your registry or central storage location. Acceptable methods that Terraform can work with are:

  • GitHub
  • Bitbucket
  • Generic Git, Mercurial repositories
  • HTTP URLs
  • S3 buckets
  • GCS buckets

See the Terraform documentation on module sources for more information.

Utilizing these methods allows you to maintain your sub modules independently and distribute different versions that larger applications can choose to inherit. You might go from Terraform projects like this:

Maintain Sub Modules

to this:

Consolidating duplicate code

Terragrunt

Preface

Utilizing registries to modularize your infrastructure is only a small part of the improvements you can make to your Terraform code.One of the significant concepts of Terraform is how it tracks the state of your infrastructure with a state file. In Terraform, you need to define a “remote state” for each grouping of infrastructure you are trying to deploy/provision. This remote state could be stored in an S3 bucket, Azure Storage account, Terraform Cloud, or another applicable service. This is how Terraform knows where to track the state of your infrastructure to determine if any changes need to be applied or if the configuration has drifted away from the baseline that is defined in your source code. The organization of your state files is essential, especially if you are managing it yourself and not using Terraform Cloud to perform your infrastructure runs.

In addition, you must keep in mind that when your development teams and applications grow. You will frequently need to manage multiple developments, testing, quality assurance, and production environments for several projects/components simultaneously; the number of configurations, variable files, CLI arguments, and provider information will become untenable over time.

How do you maintain all this infrastructure reliably? Do you track all the applications for one tier in one state file? Or do you break them up and track them separately?

How do you efficiently share variables across environments and applications without defining them multiple times? How do you successfully apply these terraform projects in a continuous deployment pipeline that is consistent and repeatable for different types of Terraform projects?

At one of our current clients, we were involved with onboarding Terraform as an Infrastructure as Code (IaC) tool. However, we ran into many challenges when trying to deploy multiple tiers (dev, test, stage, production, etc.) across several workstreams, specifically in a continuous manner within a deployment pipeline.

The client I work for has the following requirements for the web UI portion of the services they offer (consider Azure Cloud provider for context):

  • each tier has *six applications* for different areas of the United States
  • *Each application* has a web server, a Redis cache, app service plan, a storage account, and a key vault access policy to access a central key store
  • Development and test tiers are deployed to a single region.
  • Applications in the development and test tier both share a Redis cache
  • Applications in Staging and production environments have individual Redis caches
  • Stage and production environments are deployed to two regions, east and central.

Stage and Production Environments

Stage and Production tiers have up to 48 resources, respectively; the diagram above only represents three applications and excludes some services. Our client also had several other IT services that needed similar architectural setups; most projects involved deploying six application instances (for each service area of the United States), each configured differently through application settings.
Initially, our team decided to use the Terraform CLI and track our state files using an Azure Storage Account. Within the application repository, we would store several backend.tf files alongside our terraform code for each tier and pass them dynamically to terraform init –backend-config= when we want to initialize a specific environment. We also passed variable files dynamically to terraform [plan|apply|destroy] –var-file= to combine common and tier-specific application setting template files. We adopted this process in our continuous deployment pipeline by ensuring the necessary authentication principals and terraform CLI packages were available on our deployment agents and then running the appropriate constructed terraform command in the appropriate directory on that agent.

This is great but presented a few problems when scaling our process. The process we used initially allowed developers to create their own terraform modules specific to their application, utilizing either local or shared modules in our private registry. One of the significant problems came when trying to apply these modules in a continuous integration pipeline. Each project had its own unique terraform project and its own configurations that our constant deployment pipeline needed to adhere to.

Let’s also consider something relatively simple, like the naming convention of all the resources in a particular project. Usually, you would want the same-named prefix on your resources (or apply it as a resource tag) to visually identify what projects the resources belong to. Since we had to maintain multiple tiers of environments for these projects (dev, test, stage, prod), we wanted to share this variable across environments, only needing to declare it once. We also wanted to declare other variables and optionally override them in specific environments. With the Terraform CLI, there is no way to merge inputs and share variables across environments. In addition, you cannot use expressions, functions, or variables in the terraform remote state configuration blocks, forcing you to either hardcode your configuration or apply it dynamically through the CLI; see this issue.

We began to wonder if there was a better way to organize ourselves. This is where Terragrunt comes into play. Instead of keeping track of the various terraform commands, remote state configurations, variable files, and input parameters we needed to consolidate to provision our terraform projects. What if we had a declarative way of defining how our terraform project was configured? Terragrunt is a minimal wrapper around Terraform that allows you to dynamically assign and inherit remote state configurations, provider information, local variables, and module inputs for your terraform projects through a hierarchal folder structure with declarative configuration files. It also gives you a flexible and unopinionated way of consolidating everything Terraform does before it runs a command. Every option you pass to Terraform can be specifically configured through a configuration file that inherits, merges, or overrides components from other higher-level configuration files.

Terragrunt allowed us to do these important things:

  • Define a configuration file to tell what remote state file to save based on application/tier (using folder structure)
  • It allowed us to run multiple terraform projects at once with a single command
  • Pass outputs from one terraform project to another using a dependency chain.
  • Define a configuration file that tells terraform what application setting template file to apply to an application; we used .tpl files to apply application settings to our Azure compute resources.
  • Define a configuration file that tells terraform what variable files to include in your terraform commands
  • Allowed us to merge-common input variables with tier-specific input variables with the desired precedence
  • It allowed us to consistently name and create state files

Example

Let’s consider the situation where we want to maintain the infrastructure for a system with two major components: an API and a database solution. You must also deploy dev, test, stage, and production environments for this system. Dev and Test environments are deployed to one region, while stage and production environments are deployed to two regions.

We’ve created a preconfigured sample repository to demonstrate how we might handle something like this with Terragrunt. Now, although the requirements and scenario described above may not pertain to you, the preconfigured sample repository should give you a good idea of what you can accomplish with Terragrunt and the benefits it provides in the context of keeping your Terraform code organized. Also, remember that Terragrunt is unopinionated and allows you to configure it in several ways to accomplish similar results; we will only cover a few of the benefits Terragrunt provides, but be sure to check out their documentation site for more information.

To get the most out of the code sample, you should have the following:
– Terraform CLI
– Terragrunt CLI
– An Azure Subscription
– AZ CLI

Run through the setup steps if you need to. This will involve running a mini terraform project to provision a few resource groups in addition to a storage account to store your Terraform state files.
The sample repo contains several top-level directories:

  • /_base_modules
  • /bootstrap
  • /dev
  • /test
  • /stage
  • /prod
    • _base_modules – folder contains the top-level Terraform modules that your application will use. There are subfolders for each application type, the API, and storage solution (/api and /sql). For example, there is a subfolder for the API, which contains the terraform code for your API application, and one for SQL, which will contain the terraform code for your storage/database solution; take note of the main.tf, variables.tf, and outputs.tf files in each sub folder. Each application type folder will also contain a .hcl file that contains global configuration values for all environments that consume that respective application type
    • [dev/test/stage/prod] – environment folders that contain sub folders for each application type. Each sub folder for each application type will contain Terragrunt configuration files that contain variables and inputs specific to that environment
    • bootstrap – a small isolated terraform project that will spin up placeholder resource groups in addition to a storage account used to maintain remote Terraform state files

As mentioned above, there are several .hcl files in a few different places within this folder structure. These are Terragrunt configuration files. You will see one within each sub folder inside the _base_modules directory and one in every sub folder within each environment folder. These files are how Terragrunt knows what terraform commands to use, where to store each application’s remote state, and what variable files and input values to use for your terraform modules defined in the _base_modules directory. Read more about how this file is structured on Gruntwork’s website. With this sample repository, global configurations are maintained in the /_base_modules folder and consumed by configurations in the environment folders.

Let’s go over some of the basic features that Terragrunt offers.

Keeping your Remote State Configuration DRY

I immediately noticed when writing my first bits of Terraform code that I couldn’t use variables, expressions, or functions within the Terraform configuration block. You can override specific parts of this configuration through the command line, but there was no way to do this from code.

Terragrunt allows you to keep your backend and remote state configuration DRY by allowing you to share the code for backend configuration across multiple environments. Look at the /_base_modules/global.hcl file in conjunction with the /dev/Terragrunt.hcl file.

/_base_modules/global.hcl:

remote_state {
  backend = "azurerm"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    resource_group_name  = "shared"
    storage_account_name = "4a16aa0287e60d48tf"
    container_name       = "example"
    key            = "example/${path_relative_to_include()}.tfstate"
  }
}

This file defines the remote state that will be used for all environments that utilize the API module. Take special note of the ${path_relative_to_include} expression – more on this later.
A remote state Terragrunt block that like this:

remote_state {
    backend = "s3"
    config = {
      bucket = "mybucket"
      key    = "path/for/my/key"
      region = "us-east-1"
    }
  }

Is equivalent to a Terraform block that looks like:

terraform {
    backend "s3" {
      bucket = "mybucket"
      key    = "path/to/my/key"
      region = "us-east-1"
    }
    }

To inherit this configuration into a child subfolder or environment folder you can do this:

/dev/api/terragrunt.hcl


include "global" {
  path = "${get_terragrunt_dir()}/../../_base_modules/global.hcl"
  expose = true
  merge_strategy = "deep"
}

The include statement above tells Terragrunt to merge the configuration file found at _base_modules/global.hcl with its local configuration. The ${path_relative_to_include} in the global.hcl file is a predefined variable that will return the relative path of the calling .hcl file, in this case,/dev/api/terragrunt.hcl. Therefore, the resulting state file for this module would be in the example container at dev/api.tfstate. For the SQL application in the dev environment, the resulting state file would be dev/sql.tfstate; look at the _base_modules/sql/sql.hcl file. For the API application in the test environment, the resulting state file would be test/api.tfstate. Be sure to check out all of the built-in functions Terragrunt offers out of the box.

Using the feature just mentioned, we only define the details of the remote state once, allowing us to cut down on code repetition. Read more about the remote_state and include blocks and how you can configure them by visiting the Terragrunt documentation. Pay special attention to merge strategy options, how you can override includes in child modules, and the specific limitations of configuration inheritance in Terragrunt.

Keeping your Terraform Configuration DRY

Merging configuration files does not only apply to remote state configurations – but you can also apply them to the sources and inputs of your modules.

In Terragrunt, you can define the source of your module (main.tf or top-level terraform module) within the Terraform block. Let’s consider the API application:

/_base_modules/api/api.hcl

terraform {
  source = "${get_terragrunt_dir()}/../../_base_modules/api"

  extra_arguments "common_vars" {
    commands = get_terraform_commands_that_need_vars()

    required_var_files = [
      
    ]
  }
}

You’ll notice this is referencing a local path; alternatively, you can also set this to use a module from a remote git repo or terraform registry.

The api.hcl configuration is then imported as a configuration into each environment folder for the API application type:

Ex. /dev/api/terragrunt.hcl

include "env" {
  path = "${get_terragrunt_dir()}/../../_base_modules/api/api.hcl"
  expose = true
  merge_strategy = "deep"
}

Include statements with specific merge strategies can also be overwritten by configurations in child modules, allowing you to configure each environment separately if needed.

Merging inputs before they are applied to your terraform module is also extremely helpful if you need to share variables across environments. For example, all the names of your resources in your project might be prefixed with a particular character set. You can define any global inputs in the inputs section of the _base_modules/global.hcl file. Because Terragrunt configuration files are written in the HCL language, you can also utilize all the expressions and functions you use in Terraform to modify or restructure input values before they are applied. Look at how we are defining the identifier input variable found in both SQL and API modules:

Here is the terraform variable:

/_base_modules/api/variables.tf and /_base_modules/sql/variables.tf

variable "identifier" {
  type = object({
      primary = string
      secondary = string
      type = string
  })
}

Here is the primary property being assigned from the global env:

/_base_modules/global.hcl

... 
inputs = {
    identifier = {
        primary = "EXAMPLE"
    }
}
...

Here is the secondary property being assigned from the dev/dev.hcl file:

/dev/dev.hcl

inputs = {
 identifier = {
     secondary = "DEV"
 }
}

And here is the type property being applied in the module folders:

/_base_modules/sql/sql.env.tf

...
inputs = {
    identifier = {
        type = "SQL"
    }
}

/_base_modules/api/api.hcl

...
inputs = {
    identifier = {
        type = "API"
    }
}

All configurations are included in the environment configuration files with:

include "global" {
  path = "${get_terragrunt_dir()}/../../_base_modules/global.hcl"
  expose = true
  merge_strategy = "deep"
}

include "api" {
  path = "${get_terragrunt_dir()}/../../_base_modules/api/api.hcl"
  expose = true
  merge_strategy = "deep"
}

include "dev" {
  path = "../dev.hcl"
  expose = true
  merge_strategy = "deep"
}

would result in something like:

inputs = {
    identifier = {
      primary = "EXAMPLE"
secondary = "DEV"  
type = "API"
    }
}

We utilize this pattern to share variables across all environments and applications within a specific environment without having to declare them multiple times.

It is also important to note that because Terragrunt configuration files are written in the HCL language, you have access to all Terraform’s functions and expressions. As a result, because you can inherit Terragrunt configuration files into a specific environment, you can restructure, merge, or alter input variables before they are sent to terraform to be processed.

Running Multiple Modules at once

You can also run multiple terraform modules with one command using Terragrunt. For example, if you wanted to provision dev, test, stage, and prod with one command, you could run the following command in the root directory:

terragrunt run-all [init|plan|apply]

If you wanted to provision the infrastructure for a specific tier, you could run the same command inside an environment folder (dev, test, stage, etc.). This allows you to neatly organize your environments instead of maintaining everything in one state file or trying to remember what variable, backend, and provider configurations to pass in your CLI commands when you want to target a specific environment.

It is important to note that you can maintain dependencies between application types within an environment (between the SQL and API applications) and pass outputs from one application to another. Look at the dev/api environment configuration file:

/dev/api/terragrunt.hcl

dependency "sql" {
  config_path = "../sql"
    mock_outputs = {
    database_id = "temporary-dummy-id"
  }
}

locals {
}
inputs = {
    database_id = dependency.sql.outputs.database_id
    ...
}

Notice that it references the dev/sql environment as a dependency. The dev/sql environment uses the _base_modules/sql application so look at that module, specifically the outputs.tf file.

/_base_modules/sql/outputs.tf

output "database_id" {
  value = azurerm_mssql_database.test.id
}

Notice that this output is being referenced in the /dev/api/terragrunt.hcl file as dependency.

The client requirements described earlier in this post proved to be especially difficult to maintain without the benefit of being able to configure separate modules that depend on one another. With the ability to isolate different components of each environment and share their code and dependencies across environments, we could maintain multiple environments effectively and efficiently with different configurations.

Conclusion

Terraform as an IaC tool has helped us reliably develop, maintain, and scale our infrastructure demands. However, because our client work involved maintaining multiple environments and projects simultaneously, we needed specific declarative design patterns to organize our infrastructure development. Terragrunt offered us a simple way to develop numerous environments and components of a given application in a way that was repeatable and distributable to other project pipelines.

There are several features of Terragrunt we did not discuss in this post:

Before, After, and Error Hooks
Maintaining CLI flags

We would like to see some of the functionality Terragrunt offers baked into Terraform by default. However, we do not feel like Terragrunt is a final solution; Terraform is rather unopinionated and less concerned with how you set up your project structure, while Terragrunt is slightly more opinionated in your setup. Terragrunt claims to be DRY, but there is still a lot of code duplication involved when creating multiple environments or trying to duplicate infrastructure across regions. For example, creating the folder structure for an environment is cumbersome, especially when you want to add another tier.

What is Azure Databricks?

Azure Databricks is a data analytics platform that provides powerful computing capability, and the power comes from the Apache Spark cluster. In addition, Azure Databricks provides a collaborative platform for data engineers to share the clusters and workspaces, which yields higher productivity. Azure Databricks plays a major role in Azure Synapse, Data Lake, Azure Data Factory, etc., in the modern data warehouse architecture and integrates well with these resources.

Data engineers and data architects work together with data and develop the data pipeline for data ingestion with data processing. All data engineers work in a sandbox environment, and when they have verified the data ingestion process, the data pipeline is ready to be moved to Dev/Staging and Production.

Manually moving the data pipeline to staging/production environments via Azure portal will potentially introduce the difference in environments and add a tedious task to repeat manual processes in multiple environments. Automated deployment with service principal credentials is the only solution to move all your work to higher environments. There will be no privilege to configure via the Azure portal as a user. As data engineers complete the data pipeline, Cloud automation engineers will use IaC (Infrastructure as Code) to deploy all Azure resources and configure them via the automation pipeline. That includes all data related to Azure resources and Azure Databricks.

Data engineers work in Databricks with their user account, and it works very well integrating Azure Databricks with Azure key vault using key vault secret scope. All the secrets are persisted in key vault, and Databricks can get the secret value directly via linked service. Databricks uses user credentials to go against Keyvault to get the secret values. This does not work with service principal (SPN) access from Azure Databricks to the key vault. This functionality is requested but not yet there as per this GitHub issue.

JOIN OUR TEAM
Passionate about data? Check out our open data careers and apply to join our quickly growing team today!

Let’s Look at a Scenario

The data team has given automation engineers two requirements:

  • Deploy an Azure Databricks, a cluster, a dbc archive file which contains multiple notebooks in a single compressed file (for more information on dbc file, read here), secret scope, and trigger a post-deployment script.
  • Create a key vault secret scope local to Azure Databricks so the data ingestion process will have secret scope local to Databricks.

Azure Databricks is an Azure native resource, but any configurations within that workspace is not native to Azure. Azure Databricks can be deployed with Hashicorp Terraform code. For Databricks workspace-related artifacts, the Databricks provider needs to be added. For creating a cluster, use this implementation. If you are only uploading a single notebook file for creating a notebook, then use Terraform implementation like this. If not, there is an example below to use Databricks CLI to upload multiple notebook files as a single dbc archive file. The link to my GitHub repo for complete code is at the end of this blog post.

Terraform implementation

terraform {
  required_providers {
    azurerm = "~&amp;amp;gt; 2.78.0"
    azuread = "~&amp;amp;gt; 1.6.0"
    databricks = {
      source = "databrickslabs/databricks"
      version = "0.3.7"
    }
  }

  backend "azurerm" {
    resource_group_name  = "tf_backend_rg"
    storage_account_name = "tfbkndsapoc"
    container_name       = "tfstcont"
    key                  = "data-pipe.tfstate"
  }
}

provider "azurerm" {
  features {}
}

provider "azuread" {
}

data "azurerm_client_config" "current" {
}

// Create Resource Group
resource "azurerm_resource_group" "rgroup" {
  name     = var.resource_group_name
  location = var.location
}

// Create Databricks
resource "azurerm_databricks_workspace" "databricks" {
  name                          = var.databricks_name
  location                      = azurerm_resource_group.rgroup.location
  resource_group_name           = azurerm_resource_group.rgroup.name
  sku                           = "premium"
}

// Databricks Provider
provider "databricks" {
  azure_workspace_resource_id = azurerm_databricks_workspace.databricks.id
  azure_client_id             = var.client_id
  azure_client_secret         = var.client_secret
  azure_tenant_id             = var.tenant_id
}

resource "databricks_cluster" "databricks_cluster" {
  depends_on              = [azurerm_databricks_workspace.databricks]
  cluster_name            = var.databricks_cluster_name
  spark_version           = "8.2.x-scala2.12"
  node_type_id            = "Standard_DS3_v2"
  driver_node_type_id     = "Standard_DS3_v2"
  autotermination_minutes = 15
  num_workers             = 5
  spark_env_vars          = {
    "PYSPARK_PYTHON" : "/databricks/python3/bin/python3"
  }
  spark_conf = {
    "spark.databricks.cluster.profile" : "serverless",
    "spark.databricks.repl.allowedLanguages": "sql,python,r"
  }
  custom_tags = {
    "ResourceClass" = "Serverless"
  }
}

GitHub Actions workflow with Databricks CLI implementation

deploydatabricksartifacts:
    needs: [terraform]
    name: 'Databricks Artifacts Deployment'
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2.3.4
   
    - name: Set up Python 3.0
      uses: actions/setup-python@v2
      with:
        python-version: 3.0

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip

    - name: Download Databricks CLI
      id: databricks_cli
      shell: pwsh
      run: |
        pip install databricks-cli
        pip install databricks-cli --upgrade

    - name: Azure Login
      uses: azure/login@v1
      with:
        creds: ${{ secrets.AZURE_CREDENTIALS }}
   
    - name: Databricks management
      id: api_call_databricks_manage
      shell: bash
      run: |
        # Set DataBricks AAD token env
        export DATABRICKS_AAD_TOKEN=$(curl -X GET -d "grant_type=client_credentials&amp;amp;amp;client_id=${{ env.ARM_CLIENT_ID }}&amp;amp;amp;resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&amp;amp;amp;client_secret=${{ env.ARM_CLIENT_SECRET }}" https://login.microsoftonline.com/${{ env.ARM_TENANT_ID }}/oauth2/token | jq -r ".access_token")

        # Log into Databricks with SPN
        databricks_workspace_url="https://${{ steps.get_databricks_url.outputs.DATABRICKS_URL }}/?o=${{ steps.get_databricks_url.outputs.DATABRICKS_ID }}"
        databricks configure --aad-token --host $databricks_workspace_url

        # Check if workspace notebook already exists
        export DB_WKSP=$(databricks workspace ls /${{ env.TF_VAR_databricks_notebook_name }})
        if [[ "$DB_WKSP" != *"RESOURCE_DOES_NOT_EXIST"* ]];
        then
          databricks workspace delete /${{ env.TF_VAR_databricks_notebook_name }} -r
        fi

        # Import DBC archive to Databricks Workspace
        databricks workspace import Databricks/${{ env.databricks_dbc_name }} /${{ env.TF_VAR_databricks_notebook_name }} -f DBC -l PYTHON

While the above example shows how to leverage Databricks CLI to do automation operations within Databricks, Terraform also provides richer capabilities with Databricks providers. Here is an example of how to add ‘service principal’ to Databricks ‘admins’ group in workspace using Terraform. This is essential for Databricks API to work when connecting as a service principal.

Databricks Creating Cluster
Databricks cluster deployed via Terraform
Jobs Deployed via Terraform
No Jobs have been deployed via Terraform
Databricks CLI
 Job deployed using Databricks CLI in GitHub Actions workflow
Deployment with Databricks
Job triggered via Databricks CLI in GitHub Actions workflow

Not just Terraform and Databricks CLI, but also Databricks API provides similar options to access Databricks artifacts and manage them. For example, to access the clusters in the Databricks:

  • To access clusters, first, authenticate if you are a workspace user via automation or using service principal.
  • If your service principal is already part of the workspaces admins group, use this API to get the clusters list.
  • If the service principal (SPN) is not part of the workspace, use this API that uses access and management tokens.
  • If you would rather add the service principal to Databricks admins workspace group, use this API (same as Terraform option above to add the SPN).

The secret scope in Databricks can be created using Terraform or using Databricks CLI or using Databricks API!

Databricks with other Azure resources have pretty good documentation, and for automating deployments, these options are essential: learn and use the best option that suits the needs!

Here is the link to my GitHub repo for complete code on using Terraform, Databricks CLI in GitHub Actions! In addition, you can find a bonus learning how to deploy synapse, ADLS, etc., as part of modern data warehouse deployment, which I will cover in my next blog post.

Until then, happy automating!

From C# Developer to DevOps Engineer

Over the last couple of years, I’ve become a DevOps Engineer after having been primarily a C# developer. Instead of primarily C# and SQL, I was now working almost exclusively with JSON, YAML, and PowerShell. While I was very familiar with Visual Studio 2013/2015/2017 and its excellent support for the .NET work I did over the years, I found the experience for building DevOps solutions to be underwhelming. At the time, the Intellisense for Azure Resource Manager (ARM) or Terraform templates, GitLab or Azure DevOps pipelines, and PowerShell was either non-existent or incomplete. In addition, Visual Studio was quite the resource hog when I wasn’t needing all the extras it provides.

Enter Visual Studio (VS) Code

Now, I had downloaded VS Code soon after it was released with the intent to use it at some point, to say I had. However, after seeing Visual Studio Code used in some ARM template videos where snippets were used, I decided to try it out. Like most Integrated Development Environments (IDE), VS Code isn’t truly ready to go right after installation. It’s taken me some time to build up my configuration to where I am today, and I’m still learning about new features and extensions that can improve my productivity. I want to share some of my preferences.

I want to point out a couple of things. First, I’ve been working primarily with GitLab Enterprise, Azure DevOps Services, and the Azure US Government Cloud. Some of these extensions are purely focused on those platforms. Second, I use the Visual Studio Code – Insiders release rather than the regular Visual Studio Code version. I have both installed, but I like having the newest stuff as soon as I can. For this post, that shouldn’t be an issue.

Theming

As long as there’s a decent dark color theme, I’m content. The bright/light themes give me headaches over time. VS Code’s default dark theme, Dark+, fits the bill for me.

One of the themes I didn’t know I needed before I stumbled across them was icon themes. I used to have the standard, generic folder and file icons, the Minimal theme in VS Code. That made it difficult to differentiate between PowerShell scripts, ARM templates, and other file types at a glance. There are a few included templates, but I’m using the VSCode Icons Theme. It’s one of the better options, but I’m contemplating making a custom one as this one doesn’t have an icon for Terraform variables files (.tfvars), and I’d like a different icon for YAML files. If the included themes aren’t suitable for you, there are several options for both types of themes and Product Icons themes through the marketplace.

Figure 1 – VS Code’s Minimal icon theme

Workspaces

Workspaces are a collection of folders that are a “collection of one or more folders are opened in a VS Code window.” A workspace file is created that contains a list of the folders and any settings for VS Code and extensions. I’ve only recently started using workspaces because I wanted to have settings configured for different projects.

Extensions in Visual Studio Code provide enhancements to improve productivity. Extensions include code snippets, new language support, debuggers, formatters, and more. I have nearly 60 installed (this includes several Microsoft pre-installs). We will focus on a handful that I rely on regularly.

Workspace Code Configuration
Figure 2 – VS Code Workspace configuration. Also shows the choice of Azure Cloud referenced in the Azure Account extension section below.

Azure Account

The Azure Account extension provides login support for other Azure extensions. By itself, it’s not flashy, but there are a few dozen other Azure extensions that can use the logged-on account from one to reference Azure resources targeted by the others. This extension has a setting, Azure Cloud, that was the main reason I started adopting Workspaces. The default is the commercial version, AzureCloud. I’ve changed it at the user level to AzureUSGoverment, but some of my recent projects use AzureCloud. I’ve set the workspace setting for those.

Azure Resource Manager (ARM) Tools

This extension will make your ARM template tasks much more manageable! It provides an extensive collection of code snippets to scaffolding out many different Azure resources. Schema support provides template autocompletion and linting-like validation. A template navigation pane makes finding resources in a larger template easy. There is also support for parameter files, linked templates, and more.

HashiCorp Terraform

Terraform is an offering of HashiCorp. They’ve provided an extension that supports Terraform files (.tf and .tfvars), including syntax highlighting. While there are only a few snippets included, the autocompletion when defining new blocks (i.e., resources, data) is quite extensive.

Terraform
Figure 3 – Terraform autocompletion

GitLens – Git Supercharged

GitLens is full of features that make tracking changes in code easily accessible. I installed this extension for the “Current Line Blame” feature that shows who changed the current line last, when they changed it and more. In addition, there are sidebar views for branches, remotes, commits, and file history that I use regularly. There are several other features that I either don’t use or even wasn’t aware of until writing this post, as well this is an excellent tool for Git repo users.
GitLens Line Blame

MSBuild Project Tools

I had a recent project that contained a relatively large MSBuild deployment package that needed to be updated to work with the changes made to migrate the application to Azure. I haven’t worked with MSBuild in several years. When I did, I didn’t have all the syntax and keywords committed to memory. This extension provides some essential support, including element completion and syntax highlighting. It did make the project a little easier to modify.

PowerShell Preview

I’ve become a bit of a PowerShell fan. I had been introduced to it when I was working with SharePoint, but since I’ve been doing DevOps work in conjunction with Azure, I’ve started enjoying writing scripts. The less-than-ideal support for PowerShell (at the time, at least) in Visual Studio 20xx was the main reason I gave VS Code a shot. This extension (or the stable PowerShell extension) provides the excellent IntelliSense, code snippets, and syntax highlighting you’d expect. However, it also has “Go to Definition” and “Find References” features that I relied on when writing C#. In addition, it incorporates linting/code analysis with PowerShell Script Analyzer, which helps you develop clean code that follows best practices.

PowerShell Preview

Powershell (stable)

Wrapping Up

I have far more than these extensions installed, but these are the ones I use the most when doing DevOps work. Some of the others either haven’t been used enough yet, aren’t helpful for a DevOps Engineer, or weren’t interesting enough to list for the sake of brevity.

However, I’ve created a Gist on my GitHub that contains the complete list of extensions I have installed if that’s of interest. Visual Studio Code is an amazing tool that, along with the proper configuration and extensions, has increased my productivity as a DevOps Engineer.

A DoD client requested support with automated file transfers. The client has files placed in a common folder that can be accessed by the standard File Transfer Protocol (FTP). Given the FTP server’s connection information, the client requested the files to be moved to an Amazon Web Services (AWS) S3 bucket where their analysis tools are configured to use.

Automating the download and upload process would save users time by allowing for a scheduled process to transfer data files. This can be achieved using a combination of AWS Lambda and EC2 services. AWS Lambda provides a plethora of triggering and scheduling options and the power to create EC2 instances. By creating an EC2 example, a program or script can avoid Lambdas’ limitations and perform programmatic tasking such as downloading and uploading. Additionally, this can be done using Terraform to allow for deployment in any AWS space.

Writing a Script to Do the Work

Create a Script that can log in to the FTP server, fetch/download files, and copy them to an S3 bucket before using Terraform or AWS console. This can be done effectively with Python’s built-in FTPlib and the AWS boto3 API library. There are various libraries and examples online to show how to set up a Python script to download files from an FTP server and use the boto3 library to copy them to S3.

Consider writing the script that file size will play a significant role in how FTPlib and Boto3’s copy functions work. Anything over 5GB will need to be chunked from the FTP Server and use the multiple file upload methods for the AWS API.

Creating an Instance with Our Script Loaded

Amazon provides Amazon Managed Images (AMI) to start up a basic instance. The provided Linux x86 AMI is the perfect starting place for creating a custom instance and eventually custom AMI.

With Terraform, creating an instance is like creating any other module, requiring Identity and Access Management (IAM) permissions, security group settings, and other configuration settings. The following shows the necessary items needed to make an EC2 instance with a key-pair, permissions to write to s3, install Python3.8 and libraries, and copy the script to do the file transferring into the ec2-user directory.

First, generating a key-pair, a private key, and a public key is used to prove identity when connecting to an instance. The benefit of creating the key-pair in the AWS Console is access to the generated .pem file. Having a local copy will allow for connecting to the instance via the command line, while great for debugging, but not great for deployment. Terraform can be generated and store a key-pair in its memory to avoid passing sensitive information.

# Generate a ssh key that lives in terraform
# https://registry.terraform.io/providers/hashicorp/tls/latest/docs/resources/private_key
resource "tls_private_key" "instance_private_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "aws_key_pair" "instance_key_pair" {
  key_name   = "${var.key_name}"
  public_key = "${tls_private_key.instance_private_key.public_key_openssh}"

}

To set up the secrSetup, which is the security group to run the instance in, open up the ports for Secure Shell (SSH) and Secure Copy Protocol (SCP) to copy the script file(s) to the instance. A security group acts as a virtual firewall for your EC2 instances to control incoming and outgoing traffic. Then, open other ports for ingress and egress as needed, i.e. 443 for HTTP traffic. The security group will require the vpc_id for your project. This is the Visual Private Cloud (VPC) that the instance will be running. The security group should match up with your VPC settings.

resource "aws_security_group" "instance_sg" {
  name   = "allow-all-sg"
  vpc_id = "${var.vpc_id}"
…
  ingress {
    description = "ftp port"
    cidr_blocks = ["0.0.0.0/0"]
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
  }
…
}

The IAM policy, for instance, will require PutObject access to the S3 bucket. The Terraform module will need the S3 bucket as an environment variable, and a profile instance is created. If creating the IAM policy in the AWS Console, a profile instance is automatically created, but it has to be explicitly defined in Terraform.

#iam instance profile setup
resource "aws_iam_role" "instance_s3_access_iam_role" {
  name               = "instance_s3_access_iam_role"
  assume_role_policy = &lt;&lt;EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}
resource "aws_iam_policy" "iam_policy_for_ftp_to_s3_instance" {
  name = "ftp_to_s3_access_policy"

  policy = &lt;&lt;EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
          "s3:PutObject",
          "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::${var.s3_bucket}"
  },
}
EOF
}

resource "aws_iam_role_policy_attachment" "ftp_to_s3" {
  role       = aws_iam_role.instance_s3_access_iam_role.name
  policy_arn = aws_iam_policy.iam_policy_for_ftp_to_s3_instance.arn
}

resource "aws_iam_instance_profile" "ftp_to_s3_instance_profile" {
  name = "ftp_to_s3_instance_profile"
  role = "instance_s3_access_iam_role"
}

Defining the instance to start from and create the custom AMI from in the Terraform will need the following variables:

  • AMI – the AMI of the Linux x86 image
  • instance_type – the type of instance, i.e., t2.micro
  • subnet_id – the subnet string from which VPC the instance will run on
  • key-name – the name of the key, should match the key-pair name generated above or the one from the AWS console, could use a variable reference here too

Define the connection and provisioner attributes to copy the python script to do the file transferring to the ec2-user home folder. The connection will use the default ec2-user using the secure key and then copy over the python file. If using the key downloaded from AWS Console, use the following to point to the file private_key = “${file (“path/to/key-pair-file.pem”)}”.

Complete the instance setup with the correct Python version and library. The user_data attribute sends a bash script to install whatever is needed— in this case, updating Python to 3.8, installing the boto3, and paramiko libraries.

# Instance that we want to build out
resource "aws_instance" "ftp-to-s3-instance" {
  ami           = var.ami
  instance_type = var.instance_type
  subnet_id     = var.subnet_id
  key_name 	   = "${var.key_name}" #use your own key for testing
  security_groups      = ["${aws_security_group.instance_sg.id}"]
  iam_instance_profile = "${aws_iam_instance_profile.ftp_to_s3_instance_profile.id}"

  # Copies the python file to /home/ec2-user
  # depending on how the install of python works we may need to change this location
  connection {
    type        = "ssh"
    user        = "ec2-user"
    host        = "${element(aws_instance.ftp-to-s3-instance.*.public_ip, 0)}"
    private_key = "${tls_private_key.instance_private_key.private_key_pem}"
  }

  provisioner "file" {
    source      = "${path.module}/ftp_to_s3.py"
    destination = "/home/ec2-user/ftp_to_s3.py"
  }
}

  user_data = &lt;&lt;EOF
#!/bin/sh
sudo amazon-linux-extras install python3.8
python3.8 -m pip install -U pip
pip3.8 --version
pip3.8 install boto3 
pip3.8 install paramiko 

EOF
}

The last step is to create the custom AMI. This will allow our Lambda to duplicate and make as many of these instances as need.

resource "aws_ami_from_instance" "ftp-to-s3-ami" {
  name               = "ftp-to-s3_ami"
  description        = "ftp transfer to s3 bucket python 3.8 script"
  source_instance_id = "${aws_instance.ftp-to-s3-instance.id}"

  depends_on = [aws_instance.ftp-to-s3-instance]

  tags = {
    Name = "ftp-to-s3-ami"
  }
}

Creating Instances on the Fly in Lambda

Using a Lambda function that can be triggered in various ways is a straightforward way to invoke EC2 instances. The following python code show passing in environment variables to be used in an EC2 instance as both environment variables in the instance and arguments passed to the Python script. The variables needed in the python script for this example are as followed:

  • FTP_HOST – the URL of the FTP server
  • FTP_PATH – the path to the files on the URL server
  • FTP_USERNAME, FTP_PASSWORD, FTP_AUTH – to be used for any authentication for the FTP SERVER
  • S3_BUCKET_NAME – the name of the bucket for the files
  • S3_PATH – the folder or path files should be downloaded to in the S3 bucket
  • Files_to_download – for this purpose, a python list of dictionary objects with filename and size to downloaded.

For this example, the logic for checking for duplicate files is down before the Lambda invoking the instance for transferring is called. This allows the script in the instance to remain singularly focused on downloading and uploading. It is important to note that the files_to_download variable is converted to a string, and the quotes are made into double-quotes. Not doing this will make the single quotes disappear when passing to the EC2 instance.

The init_script variable will use the passed-in event variables to set up the environment variables and python script arguments. Just like when creating the instance, the user_data script is run by the instance’s root user. The root user will need to use the ec2-user’s python to run our script with the following bash command: PYTHONUSERBASE=/home/ec2-user/.local python3.8 /home/ec2-user/ftp_to_s3.py {s3_path} {files_to_download}.

# convert to string with double quotes so it knows its a string
    files_to_download = ",".join(map('"{0}"'.format, files_to_download))
    vars = {
        "FTP_HOST": event["ftp_url"],
        "FTP_PATH": event["ftp_path"],
        "FTP_USERNAME": event["username"],
        "FTP_PASSWORD": event["password"],
        "FTP_AUTH_KEY": event["auth_key"],
        "S3_BUCKET_NAME": event["s3_bucket"],
        "files_to_download": files_to_download,
        "S3_PATH": event["s3_path"],
    }
    print(vars)

    init_script = """#!/bin/bash
                /bin/echo "**************************"
                /bin/echo "* Running FTP to S3.     *"
                /bin/echo "**************************"
                export S3_BUCKET_NAME={S3_BUCKET_NAME}
                export PRODUCTS_TABLE={PRODUCTS_TABLE}
                export FTP_HOST={FTP_HOST}
                export FTP_USERNAME={FTP_USERNAME}
                export FTP_PASSWORD={FTP_PASSWORD}
                PYTHONUSERBASE=/home/ec2-user/.local python3.8 /home/ec2-user/ftp_to_s3.py {s3_path} {files_to_download}
                shutdown now -h""".format(
        **vars
    )

Invoke the instance with the boto3 library providing the parameters for the custom image AMI, Instance type, key-pair, subnet, and instance profile, all defined by Terraform environment variables. Optionally, set the Volume size to 50GB from the default 8GB for larger files.

instance = ec2.run_instances(
        ImageId=AMI,
        InstanceType=INSTANCE_TYPE,
        KeyName=KEY_NAME,
        SubnetId=SUBNET_ID,
        MaxCount=1,
        MinCount=1,
        InstanceInitiatedShutdownBehavior="terminate",
        UserData=init_script,
        IamInstanceProfile={"Arn": INSTANCE_PROFILE},
        BlockDeviceMappings=[{"DeviceName": "/dev/xvda", "Ebs": {"VolumeSize": 50}}],
    )

Conclusion

After deploying to AWS, Terraform will have created a Lambda that invokes an EC2 instance running the script passed to it during its creation. Triggering the Lambda function to invoke the custom instance can be done from a DynamoDB Stream update, scheduled timer, or even another Lambda function. This provides flexibility on how and when the instance is called.

Ultimately, this solution provides a flexible means of downloading files from an FTP server. Changes to the Lambda invoking the instance could include separating the file list to create several more minor instances to run simultaneously, moving more files faster to the AWS S3 bucket. This greatly depends on the client’s needs and the cost of operating the AWS services.

Changes can also be made to the script downloading the files. One option would be to use more robust FTP libraries than the built-in provided python library. Larger files may require more effort as FTP servers can timeout when network latency and file sizes come into play. Python’s FTPlib does not auto-reconnect, nor does it keep track of incomplete file downloads.