Docker Swarm Setup with Ansible
This guide explains how the infra-bootstrap-tools collection automates the creation and management of a Docker Swarm cluster. By the end, you will understand the inventory model, the provisioning workflow, and what services are running on your cluster.
Architecture Overview
A Docker Swarm cluster consists of two types of nodes:
- Managers (
docker_swarm_managers): These nodes coordinate the cluster. They handle scheduling, service discovery, and maintain the desired state of your applications. At least one manager is required to initialize the Swarm. - Nodes (
docker_swarm_nodes): These are worker nodes that run your application containers. They receive instructions from the managers and execute tasks. Workers are optionalβa manager-only cluster works for small deployments.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Swarm Cluster β
β β
β ββββββββββββββββ ββββββββββββββββ β
β β Manager 0 β β Manager N β β Manage the β
β β (controller)β β β cluster β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β
β ββββββββ΄ββββββββ ββββββββ΄ββββββββ β
β β Worker 0 β β Worker N β β Run your β
β β β β β containers β
β ββββββββββββββββ ββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Portainer (Web UI) + Caddy (Reverse Proxy) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Inventory: Defining Your Hosts
The inventory determines which hosts participate in the cluster and what role they play. All inventory files live in ansible/playbooks/inventory/, and Ansible merges them automatically when the directory is passed with -i.
Inventory Groups
The playbooks use service-specific group names to avoid ambiguity:
| Group | Purpose |
|---|---|
docker_swarm_managers | Hosts that initialize and manage the Docker Swarm |
docker_swarm_nodes | Hosts that join the Swarm as workers |
This is important because the same inventory directory can contain hosts for other services (e.g., k3s_servers, k3s_agents) without conflict.
Static Inventory (Manual Hosts)
To add a host manually, create a YAML file in ansible/playbooks/inventory/. For example, a single server acting as a Swarm manager:
# ansible/playbooks/inventory/my_server.yml
docker_swarm_managers:
hosts:
my_server:
ansible_host: 203.0.113.10
ansible_user: ubuntu
Dynamic Inventory (Terraform-Provisioned Hosts)
When provisioning infrastructure with Terraform (DigitalOcean, AWS, or GCP), the corresponding Ansible role automatically generates an inventory file and places it in the same inventory/ directory. After a meta: refresh_inventory task, Ansible merges these dynamically created hosts into the existing groups.
The Terraform roles are service-agnostic: the group names they produce are configurable via variables. They default to docker_swarm_managers and docker_swarm_nodes, but can be overridden for any use case:
| Variable | Default | Description |
|---|---|---|
terraform_digitalocean_inventory_managers_group | docker_swarm_managers | Inventory group for manager nodes |
terraform_digitalocean_inventory_nodes_group | docker_swarm_nodes | Inventory group for worker nodes |
The same pattern applies to terraform_aws_inventory_* and terraform_gcp_inventory_*.
The Deployment Workflow
The full deployment follows a clear sequence. Each step is handled by an Ansible role:
1. Provision Infrastructure (Terraform)
β
βΌ
2. Generate & Merge Inventory
β
βΌ
3. Install Docker on All Hosts
β
βΌ
4. Initialize Swarm on Manager(s)
β
βΌ
5. Join Workers to the Swarm
β
βΌ
6. Install Plugins (e.g., Rclone)
β
βΌ
7. Deploy Apps (Portainer, Caddy)
Step 1 β Provision Infrastructure
The terraform_digitalocean role (or terraform_aws / terraform_gcp) runs Terraform to create the required VMs. This step is only needed when provisioning cloud infrastructureβif you are using pre-existing hosts (like a dedicated server), you skip this step entirely by defining your hosts in a static inventory file.
For the full provisioning workflow details, see Spinning up infrastructure with Ansible and Terraform.
Secrets used:
| Secret (1Password) | Variable | Purpose |
|---|---|---|
Ansible SSH Key | private_key_openssh, tf_var_public_key_openssh | SSH key added to the agent and injected into new VMs |
DIGITALOCEAN_ACCESS_TOKEN | tf_digitalocean_access_token | Terraform authenticates with DigitalOcean |
TF_S3_BACKEND | tf_aws_access_key_id, tf_aws_secret_access_key | Remote Terraform state backend (S3) |
These secrets are defined in host_vars/localhost.yml and fetched via the community.general.onepassword lookup plugin. See Understanding Ansible Concepts for more on how variables and secrets are organized.
Step 2 β Generate & Merge Inventory
After Terraform completes, the role generates an inventory file from the Terraform outputs (IP addresses, SSH host keys) and writes it to ansible/playbooks/inventory/. The inventory is rendered from a Jinja2 template (inventory.tpl) that uses configurable group names β so the same Terraform role can produce hosts in docker_swarm_managers, k3s_servers, or any custom group depending on the variables you set. A meta: refresh_inventory task then tells Ansible to re-read the entire inventory directory, merging the newly created hosts into the existing groups alongside any static inventory files.
Step 3 β Install Docker
The docker role installs Docker Engine on all hosts that will participate in the Swarm (both managers and workers). Under the hood, it adds the official Docker APT repository and GPG key, installs docker-ce, docker-ce-cli, docker-compose, and containerd.io, adds the specified users to the docker group, and verifies the installation by pulling and running a hello-world container.
Step 4 β Initialize the Swarm
Two roles work together here. First, the docker_swarm_controller role installs Python dependencies (python3-docker, python3-jsondiff, python3-passlib) on all managers so they can use Ansible’s docker_swarm module. Then docker_swarm_manager initializes the Swarm on the first manager using docker_swarm: state=present, registers the result (including join tokens), and joins any additional managers using the manager join token from the first node’s hostvars.
Step 5 β Join Workers
The docker_swarm_node role retrieves the worker join token from the first manager’s hostvars (set during Step 4) and calls docker_swarm: state=join on each worker node, passing the token and the manager addresses. This is fully automatic β no manual token copying is needed.
Step 6 β Install Plugins
The docker_swarm_plugin_rclone role installs the Rclone Docker volume plugin on all nodes, enabling persistent volume storage backed by S3-compatible object storage (e.g., DigitalOcean Spaces). It installs fuse as a system dependency, creates the plugin’s config and cache directories, templates an rclone.conf file with the storage credentials, enables the rclone/docker-volume-rclone Docker plugin, and creates a test volume to verify the setup.
Secrets used:
| Secret (1Password) | Variable | Purpose |
|---|---|---|
RCLONE_DIGITALOCEAN | docker_swarm_plugin_rclone_digitalocean_access_key_id | S3-compatible access key for object storage |
RCLONE_DIGITALOCEAN | docker_swarm_plugin_rclone_digitalocean_secret_access_key | S3-compatible secret key for object storage |
These secrets are defined in group_vars/all.yml and apply to every node in the cluster.
Step 7 β Deploy Applications
Finally, the docker_swarm_app_caddy and docker_swarm_app_portainer roles deploy core services to the Swarm:
- Caddy β an automatic HTTPS reverse proxy that handles TLS certificates, DNS management, and GitHub OAuth authentication for your services.
- Portainer β a web-based management UI for Docker Swarm, providing visibility into services, containers, volumes, and networks. It supports GitOps workflows and built-in app templates.
These applications run as Docker Swarm services and are deployed to a single manager node (docker_swarm_managers[0]).
Secrets used (Caddy):
| Secret (1Password) | Variable | Purpose |
|---|---|---|
CADDY_GITHUB_APP | docker_swarm_app_caddy_github_client_id | GitHub OAuth Client ID for the authentication portal |
CADDY_GITHUB_APP | docker_swarm_app_caddy_github_client_secret | GitHub OAuth Client Secret |
CADDY_JWT_SHARED_KEY | docker_swarm_app_caddy_jwt_shared_key | JWT signing key for caddy-security tokens |
CADDY_DIGITALOCEAN_API_TOKEN | docker_swarm_app_caddy_digitalocean_api_token | DNS challenge for automatic HTTPS certificates |
These secrets are defined in group_vars/docker_swarm_managers.yaml and only loaded on manager hosts.
Available Playbooks
Several playbooks are provided for different levels of the stack:
| Playbook | What it does |
|---|---|
docker-swarm.yml | Base Swarm setup: Docker + Swarm init + join workers |
docker-swarm-portainer.yml | Swarm + Portainer web UI |
docker-swarm-portainer-caddy.yml | Swarm + Portainer + Caddy reverse proxy |
main.yml | Full lifecycle: Terraform provisioning + Swarm + Plugins + Apps |
Running It
Use the Makefile targets to build the collection and run the playbooks:
# Full lifecycle (provision DigitalOcean + configure everything)
make up
# Tear down DigitalOcean infrastructure
make down
# Docker Swarm only (against existing hosts in inventory)
make swarm
# Docker Swarm + Portainer
make swarm-portainer
# Docker Swarm + Portainer + Caddy
make swarm-portainer-caddy
All targets always point to the ansible/playbooks/inventory/ directory, so the inventory is the single source of truth for what runs where.
What You Get
After a successful deployment, your cluster has:
- A Docker Swarm cluster with one or more managers and optional worker nodes.
- Docker installed and configured on every node.
- Portainer accessible via a web browser for managing services, stacks, containers, and volumes.
- Caddy handling HTTPS termination and reverse proxying traffic to your services with automatic Let’s Encrypt certificates.
- Rclone Docker plugin enabling persistent volumes backed by object storage.
- An inventory directory that serves as the living documentation of your infrastructureβevery host and its role is explicitly declared.