Docker Swarm Setup

Docker Swarm Setup with Ansible

This guide explains how the infra-bootstrap-tools collection automates the creation and management of a Docker Swarm cluster. By the end, you will understand the inventory model, the provisioning workflow, and what services are running on your cluster.

Architecture Overview

A Docker Swarm cluster consists of two types of nodes:

  • Managers (docker_swarm_managers): These nodes coordinate the cluster. They handle scheduling, service discovery, and maintain the desired state of your applications. At least one manager is required to initialize the Swarm.
  • Nodes (docker_swarm_nodes): These are worker nodes that run your application containers. They receive instructions from the managers and execute tasks. Workers are optionalβ€”a manager-only cluster works for small deployments.
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Docker Swarm Cluster                β”‚
β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚   Manager 0  β”‚  β”‚   Manager N  β”‚  ← Manage the   β”‚
β”‚  β”‚  (controller)β”‚  β”‚              β”‚    cluster       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚         β”‚                 β”‚                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚   Worker 0   β”‚  β”‚   Worker N   β”‚  ← Run your     β”‚
β”‚  β”‚              β”‚  β”‚              β”‚    containers    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Portainer (Web UI) + Caddy (Reverse Proxy)  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Inventory: Defining Your Hosts

The inventory determines which hosts participate in the cluster and what role they play. All inventory files live in ansible/playbooks/inventory/, and Ansible merges them automatically when the directory is passed with -i.

Inventory Groups

The playbooks use service-specific group names to avoid ambiguity:

GroupPurpose
docker_swarm_managersHosts that initialize and manage the Docker Swarm
docker_swarm_nodesHosts that join the Swarm as workers

This is important because the same inventory directory can contain hosts for other services (e.g., k3s_servers, k3s_agents) without conflict.

Static Inventory (Manual Hosts)

To add a host manually, create a YAML file in ansible/playbooks/inventory/. For example, a single server acting as a Swarm manager:

# ansible/playbooks/inventory/my_server.yml
docker_swarm_managers:
  hosts:
    my_server:
      ansible_host: 203.0.113.10
      ansible_user: ubuntu

Dynamic Inventory (Terraform-Provisioned Hosts)

When provisioning infrastructure with Terraform (DigitalOcean, AWS, or GCP), the corresponding Ansible role automatically generates an inventory file and places it in the same inventory/ directory. After a meta: refresh_inventory task, Ansible merges these dynamically created hosts into the existing groups.

The Terraform roles are service-agnostic: the group names they produce are configurable via variables. They default to docker_swarm_managers and docker_swarm_nodes, but can be overridden for any use case:

VariableDefaultDescription
terraform_digitalocean_inventory_managers_groupdocker_swarm_managersInventory group for manager nodes
terraform_digitalocean_inventory_nodes_groupdocker_swarm_nodesInventory group for worker nodes

The same pattern applies to terraform_aws_inventory_* and terraform_gcp_inventory_*.

The Deployment Workflow

The full deployment follows a clear sequence. Each step is handled by an Ansible role:

1. Provision Infrastructure (Terraform)
       β”‚
       β–Ό
2. Generate & Merge Inventory
       β”‚
       β–Ό
3. Install Docker on All Hosts
       β”‚
       β–Ό
4. Initialize Swarm on Manager(s)
       β”‚
       β–Ό
5. Join Workers to the Swarm
       β”‚
       β–Ό
6. Install Plugins (e.g., Rclone)
       β”‚
       β–Ό
7. Deploy Apps (Portainer, Caddy)

Step 1 β€” Provision Infrastructure

The terraform_digitalocean role (or terraform_aws / terraform_gcp) runs Terraform to create the required VMs. This step is only needed when provisioning cloud infrastructureβ€”if you are using pre-existing hosts (like a dedicated server), you skip this step entirely by defining your hosts in a static inventory file.

For the full provisioning workflow details, see Spinning up infrastructure with Ansible and Terraform.

Secrets used:

Secret (1Password)VariablePurpose
Ansible SSH Keyprivate_key_openssh, tf_var_public_key_opensshSSH key added to the agent and injected into new VMs
DIGITALOCEAN_ACCESS_TOKENtf_digitalocean_access_tokenTerraform authenticates with DigitalOcean
TF_S3_BACKENDtf_aws_access_key_id, tf_aws_secret_access_keyRemote Terraform state backend (S3)

These secrets are defined in host_vars/localhost.yml and fetched via the community.general.onepassword lookup plugin. See Understanding Ansible Concepts for more on how variables and secrets are organized.

Step 2 β€” Generate & Merge Inventory

After Terraform completes, the role generates an inventory file from the Terraform outputs (IP addresses, SSH host keys) and writes it to ansible/playbooks/inventory/. The inventory is rendered from a Jinja2 template (inventory.tpl) that uses configurable group names β€” so the same Terraform role can produce hosts in docker_swarm_managers, k3s_servers, or any custom group depending on the variables you set. A meta: refresh_inventory task then tells Ansible to re-read the entire inventory directory, merging the newly created hosts into the existing groups alongside any static inventory files.

Step 3 β€” Install Docker

The docker role installs Docker Engine on all hosts that will participate in the Swarm (both managers and workers). Under the hood, it adds the official Docker APT repository and GPG key, installs docker-ce, docker-ce-cli, docker-compose, and containerd.io, adds the specified users to the docker group, and verifies the installation by pulling and running a hello-world container.

Step 4 β€” Initialize the Swarm

Two roles work together here. First, the docker_swarm_controller role installs Python dependencies (python3-docker, python3-jsondiff, python3-passlib) on all managers so they can use Ansible’s docker_swarm module. Then docker_swarm_manager initializes the Swarm on the first manager using docker_swarm: state=present, registers the result (including join tokens), and joins any additional managers using the manager join token from the first node’s hostvars.

Step 5 β€” Join Workers

The docker_swarm_node role retrieves the worker join token from the first manager’s hostvars (set during Step 4) and calls docker_swarm: state=join on each worker node, passing the token and the manager addresses. This is fully automatic β€” no manual token copying is needed.

Step 6 β€” Install Plugins

The docker_swarm_plugin_rclone role installs the Rclone Docker volume plugin on all nodes, enabling persistent volume storage backed by S3-compatible object storage (e.g., DigitalOcean Spaces). It installs fuse as a system dependency, creates the plugin’s config and cache directories, templates an rclone.conf file with the storage credentials, enables the rclone/docker-volume-rclone Docker plugin, and creates a test volume to verify the setup.

Secrets used:

Secret (1Password)VariablePurpose
RCLONE_DIGITALOCEANdocker_swarm_plugin_rclone_digitalocean_access_key_idS3-compatible access key for object storage
RCLONE_DIGITALOCEANdocker_swarm_plugin_rclone_digitalocean_secret_access_keyS3-compatible secret key for object storage

These secrets are defined in group_vars/all.yml and apply to every node in the cluster.

Step 7 β€” Deploy Applications

Finally, the docker_swarm_app_caddy and docker_swarm_app_portainer roles deploy core services to the Swarm:

  • Caddy β€” an automatic HTTPS reverse proxy that handles TLS certificates, DNS management, and GitHub OAuth authentication for your services.
  • Portainer β€” a web-based management UI for Docker Swarm, providing visibility into services, containers, volumes, and networks. It supports GitOps workflows and built-in app templates.

These applications run as Docker Swarm services and are deployed to a single manager node (docker_swarm_managers[0]).

Secrets used (Caddy):

Secret (1Password)VariablePurpose
CADDY_GITHUB_APPdocker_swarm_app_caddy_github_client_idGitHub OAuth Client ID for the authentication portal
CADDY_GITHUB_APPdocker_swarm_app_caddy_github_client_secretGitHub OAuth Client Secret
CADDY_JWT_SHARED_KEYdocker_swarm_app_caddy_jwt_shared_keyJWT signing key for caddy-security tokens
CADDY_DIGITALOCEAN_API_TOKENdocker_swarm_app_caddy_digitalocean_api_tokenDNS challenge for automatic HTTPS certificates

These secrets are defined in group_vars/docker_swarm_managers.yaml and only loaded on manager hosts.

Available Playbooks

Several playbooks are provided for different levels of the stack:

PlaybookWhat it does
docker-swarm.ymlBase Swarm setup: Docker + Swarm init + join workers
docker-swarm-portainer.ymlSwarm + Portainer web UI
docker-swarm-portainer-caddy.ymlSwarm + Portainer + Caddy reverse proxy
main.ymlFull lifecycle: Terraform provisioning + Swarm + Plugins + Apps

Running It

Use the Makefile targets to build the collection and run the playbooks:

# Full lifecycle (provision DigitalOcean + configure everything)
make up

# Tear down DigitalOcean infrastructure
make down

# Docker Swarm only (against existing hosts in inventory)
make swarm

# Docker Swarm + Portainer
make swarm-portainer

# Docker Swarm + Portainer + Caddy
make swarm-portainer-caddy

All targets always point to the ansible/playbooks/inventory/ directory, so the inventory is the single source of truth for what runs where.

What You Get

After a successful deployment, your cluster has:

  • A Docker Swarm cluster with one or more managers and optional worker nodes.
  • Docker installed and configured on every node.
  • Portainer accessible via a web browser for managing services, stacks, containers, and volumes.
  • Caddy handling HTTPS termination and reverse proxying traffic to your services with automatic Let’s Encrypt certificates.
  • Rclone Docker plugin enabling persistent volumes backed by object storage.
  • An inventory directory that serves as the living documentation of your infrastructureβ€”every host and its role is explicitly declared.