Spinning up infrastructure

Spinning up infrastructure with Ansible and Terraform

This guide is a walkthrough of the core automated workflow I am using to manage my Docker Swarm environment.

I use DigitalOcean for the control plane and management layer (but other options are available—see the roles and Terraform modules).

This setup uses Ansible as the orchestration layer, meaning Ansible is responsible for running Terraform (for instance provisioning) as well as setting up the hosts and local environment. This gives us a single entry point for any script or automation task.

Prerequisites (1Password Secrets)

Before the deployment workflow can run, you need a way to manage secrets. My setup uses 1Password, but this is entirely optional. Because all required variables are standard Ansible variables defined in host_vars and group_vars, you can define them using Ansible Vault, environment variables, or plain text as usual.

However, for the sake of explaining my specific setup, I will refer to using the community.general.onepassword lookup plugin to fetch these secrets dynamically at runtime.

The workflow uses an OP_SERVICE_ACCOUNT_TOKEN (provided as a GitHub Secret) to authenticate the 1Password CLI within the CI/CD pipeline.

The following secrets must be present in the infra-bootstrap-tools vault in 1Password:

Secret NameFieldsUsed For
Ansible SSH Keyprivate key (Implicit via ssh lookup), public keyAdding the SSH key to the local ssh-agent (private_key_openssh) and adding the public key to new DigitalOcean droplets (tf_var_public_key_openssh).
DIGITALOCEAN_ACCESS_TOKENcredentialUsed by Terraform to authenticate and provision resources on DigitalOcean (tf_digitalocean_access_token).
TF_S3_BACKENDusername, credentialAWS Access Key ID and Secret Access Key used for storing Terraform state remotely (tf_aws_access_key_id, tf_aws_secret_access_key).
RCLONE_DIGITALOCEANusername, credentialUsed by the Rclone Docker plugin to sync volumes to DigitalOcean Spaces/S3 (docker_swarm_plugin_rclone_digitalocean_access_key_id, ...secret_access_key).

The GitHub Deployment Workflow

The true power of this setup lies in the automated deployment pipeline managed by .github/workflows/ansible.yml. Instead of running playbooks manually from a laptop, the entire infrastructure lifecycle is managed through pull requests.

Here is how the workflow breathes life into the infrastructure:

  • Trigger: It springs into action on Pull Requests to the main branch (specifically when changes occur in the ansible/ directory or requirements.txt) and can also be triggered manually (workflow_dispatch) for ad-hoc deployments.
  • Concurrency Control: Deploying infrastructure requires careful coordination. The workflow uses a custom .github/actions/pr-lock action to ensure that only one deployment runs at a time. This acts as a traffic controller, preventing race conditions and infrastructure state conflicts when multiple PRs are open simultaneously.
  • Validation: Before any code touches the servers, it runs ansible-lint to ensure all playbooks and roles follow strict best practices.
  • Environment Setup: It prepares a pristine, isolated environment: installing Python 3.12, caching pip dependencies for speed, and using a custom setup script (./bin/bash/setup.sh ansible 1password-cli) to equip the runner with Ansible and the 1Password CLI.
  • Execution: Finally, it executes the main playbook (ansible-playbook -i ansible/playbooks/inventory ansible/playbooks/main.yml), securely passing in the OP_SERVICE_ACCOUNT_TOKEN so Ansible can fetch the necessary secrets on the fly.

The Standard Playbook and Roles

The standard deployment playbook (ansible/playbooks/main.yml) is the conductor of the orchestra. It’s broken down into several distinct plays, each targeting a specific subset of hosts and applying roles to achieve the final state.

The playbook leverages the following roles:

1. Provision Infrastructure & Setup Execution Environment (localhost)

This initial play runs locally (or on the GitHub Actions runner). It fetches secrets, adds the SSH key to the agent, and uses Terraform to provision the DigitalOcean droplets.

  • xnok.infra_bootstrap_tools.utils_ssh_add: Adds the fetched SSH private key to the local ssh-agent.
  • diodonfrost.terraform: Installs the Terraform CLI.
  • xnok.infra_bootstrap_tools.terraform_digitalocean: A wrapper role that executes the Terraform modules to actually create the DigitalOcean droplets using the provided API token and SSH public key.
  • xnok.infra_bootstrap_tools.utils_affected_roles: Detects which roles have changed to optimize subsequent execution steps.

2. Docker Swarm Managers (managers)

This play targets the newly created Manager nodes.

  • xnok.infra_bootstrap_tools.docker_swarm_controller: Sets up one specific node as the controller (installing required Python tools) to initialize the Swarm.
  • xnok.infra_bootstrap_tools.docker_swarm_manager: Initializes the Docker Swarm and joins the remaining manager nodes to it.

3. Docker Swarm Nodes (nodes)

This play targets the Worker nodes.

  • xnok.infra_bootstrap_tools.docker_swarm_node: Joins the worker droplets to the Docker Swarm cluster.

4. Plugins (all)

This play targets all nodes in the cluster.

  • xnok.infra_bootstrap_tools.docker_swarm_plugin_rclone: Installs and configures the Rclone Docker plugin, enabling the cluster to mount and sync volumes to DigitalOcean object storage using the RCLONE_DIGITALOCEAN credentials.

5. Applications (managers[0])

This play targets a single manager node to deploy core cluster applications.

  • xnok.infra_bootstrap_tools.docker_swarm_app_caddy: Deploys Caddy as a reverse proxy for the cluster.
  • xnok.infra_bootstrap_tools.docker_swarm_app_portainer: Deploys Portainer to provide a web UI for managing the Docker Swarm, automatically configuring it to work behind the Caddy proxy.