Docker Swarm

Do you actually need Kubernetes? Netflix needs it, a few others probably do too. If you need to learn it for work or something, go right ahead, but chances are good that YOU don’t need to scale to 10,000 nodes at home. I didn’t want Kubernetes (or k8s, or k3s, or Minikube) at home since I am pretty limited for resources and want as much bang for my virtual buck as I can get.

I built mine out across Proxmox host(s) with Debian 10 VM’s (not containers) and TrueNAS virtual machines (Debian 10 also), but it doesn’t really matter, just try to stay with the same version of Docker across them for your own sanity. You can probably do this with multiple Raspberry Pi’s, but mine are otherwise occupied. I don’t know if you can (or should) combine the different machine types into one cluster, that seems like it would be bad. And many things need different images for the pi and would need to account for that. I’m setting up a Pi 1 with an 8TB Easystore to live at my parents house for backups and it has to use different images for minio and possibly Kuma. It’s not fast, but it doesn’t need to be.

Steps:

  • Base OS
  • Docker (clone them here if needed)
  • Swarm init
  • Swarm join cluster
  • Test it!
  • NFS Mounts for data (optional, depends on your needs)
  • Traefik
  • Configure deployments
  • Swarmpit (optional)
  • Swarmprom (optional)
  • Portainer (optional)
  • Apps and apps and apps (this is why you’re here)
  • Proxied services outside of the swarm (optional)
  • Add basic authentication to an app that doesn’t have any (optional)
  • Apps that are docker but NOT public (optional)

Code and scripts are all here!

https://github.com/8layer8/swarm-public/tree/main

While you don’t need all of this, it does help to have something to start with.

Base OS

Do a basic Debian 10 install, set up disks, networks and hostnames as you need them, unselect the GUI, you really only want ssh and base utilities. Once it is up, ssh into it and run:

apt-get update
apt-get -y install apt-transport-https ca-certificates curl gnupg-agent software-properties-common sudo vim mc

Docker (clone them here if needed)

curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
apt-get update
apt-get -y install docker-ce docker-ce-cli
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker brad
sudo newgrp docker
sudo curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

Swarm init (master host)

# If you are going to clone the box for a worker node, stop here and clone it first!
# Use the IP or FQDN of the host
sudo docker swarm init --advertise-addr 192.168.0.123
# this gives you a URL to join the cluster, keep it somewhere

Swarm join cluster (worker nodes)

# If needed, regenerate the token on the master node:
docker swarm join-token worker
# Then run it on any/all worker nodes like:
docker swarm join --token SWMTKN-1-02wqm9v19hey4js984yg8hrhg9tmai80pbwtwiwiery8b-8e9857y49herughkc6jhtc47y5y 192.168.0.123:2377

Test it!

docker node ls
docker service create --name webserver -p 8080:80 nginx
# You should now be able to hit ANY node on port 8080 and get a response, it MAY take a few minutes, just chill.

# Scale it up or down:
docker service scale webserver=3
docker service scale webserver=1

# See that it is working:
docker service ls

# Delete it:
docker service rm webserver

NFS Mounts for data

There’s a way for Docker to handle NFS mounts itself, this is not it (yet)

Debian:
apt update
apt -y install nfs-common
vim /etc/fstab
192.168.0.253:/mnt/pool_alpha/vm_storage /mnt/pool_alpha/vm_storage nfs defaults 0 0
192.168.0.253:/mnt/pool_alpha/video/ /mnt/pool_alpha/video nfs defaults 0 0
(etc)
mkdir -p /mnt/pool_alpha/vm_storage
mkdir -p /mnt/pool_alpha/video

* Match up the NFS owners from a working node, this can be squirrelly! You don't want to just open it all up, but your NFS shouldn't be open to the internet anyway. It's your call. Sometimes you have to launch a service with no mount points defined, bash into it, cd to the mount point and ls -lanrt to see what numbers the UID and GID are, then use those to chown the mount root, then try again with the NFS mount

# Mount the shares
mount -a

# if they don't mount at boot, create this file:
/etc/network/if-up.d/fstab

#!/bin/sh
mount -a

# Make it executable 
sudo chmod +x /etc/network/if-up.d/fstab

# reboot the box and make sure the mounts work as expected before getting much further!

Firewall

Your firewall plumbing will vary, I’m assuming you are putting this behind a home firewall.

All you need to forward is TCP/80 and TCP/443 to your master node. If you can, and understand what you’re doing, you can forward 80 and 443 to ALL of the docker swarm boxes, since swarm will handle the networking no matter which box you land on. This is not always possible with some firewalls so the safe way is to just point them at the master node. Even though nothing really lives on port 80 on Traefik, it will issue the redirects to the https site(s) and it also has to be there for Lets Encrypt challenge responses to make it to where they need to go. You have to have port 80 open! Very little traffic will use it, but it has to be there for things to work properly.

Traefik

While swarm will advertise ports across all members, it doesn’t handle much else (Certs, authentication, etc.) So setting up Traefik to do all the plumbing automatically is the way to go.
I set mine up according to https://dockerswarm.rocks/traefik/
It can be a little funky getting it play nice with Lets Encrypt and wildcard domains, so my sanitized configs for using Traefik + Lets Encrypt + Digital Ocean wildcard DNS are in the git repo.

Configure Deployments

There are dozens of ways to do this, but this is for home, and I wanted it simple.

Simple way:

Use a directory on the NFS shares to keep all of your docker compose files and start/stop scripts. Ideally, only the master will run them, but if it all falls apart, rebuilding a new cluster takes a few minutes, mount the NFS, init the swarm, run the scripts and you’re back in business.

I set up:

/mnt/pool_alpha/vm_storage/docker-compose/traefik
├── start.sh
├── stop.sh
└── traefik.yml

Then each service stack has its’ own directory and configs and scripts under docker-compose:

audioserve
compression
cura
homeserver
kuma
traefik
etc.

Docker-compose you say? Yes, well, Docker swarm and docker-compose are very closely related. You can start with docker-compose.yml files, make a minor addition to it, and deploy it into the swarm.

I made a wrapper script to start and stop all the services, and start and stop a node to evacuate all the services from it. Docker cleanup scripts, log viewer scripts, etc. all go here so any node can do them. (Scripts at end of post)

Swarmpit

Stupid name, nice utility to manage your swarm. Also easy to set up. I followed this:

https://dockerswarm.rocks/swarmpit/

My scripts (again) are at the end.

Swarmprom

Add Grafana and Prometheus to your swarm, all the workers report in, easy peasy!

https://dockerswarm.rocks/swarmprom/

Hey look! Grafana and Prometheus! I almost have a Bingo…

Portainer

You can use scripts, Swarmpit or Portainer to launch services. I like using the scripts because I can clone them out of a git repo (and back in) and run them all to build a new cluster with everything in minutes. You can paste configs into Swarmpit and roll things back with the GUI. Portainer likes to take control, so I just use it as a GUI for troubleshooting, but YMMV:

https://dockerswarm.rocks/portainer/

Apps and apps and apps

Ok, so by now you have the Swarm running on multiple nodes, Swarmpit and Grafana and Portainer and Traefik and working certificates and everything! So, let’s add an application to hang on the internet.

Setting up Kuma Uptime as an external example application:

mkdir /mnt/pool_alpha/vm_storage/docker-compose/kuma
cd /mnt/pool_alpha/vm_storage/docker-compose/kuma
touch start.sh

touch stop.sh
touch kuma.yml

chmod +x *.sh

kuma.yml:

version: '3.7' 
services: 
  kuma: 
    image: louislam/uptime-kuma:latest
    environment: 
      - PUID=1020
      - PGID=1020
      - TZ=America/New_York
    networks:
      - net
      - traefik-public
    volumes: 
      - /mnt/pool_alpha/vm_storage/kuma:/app/data
    deploy:
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik-public
        - traefik.constraint-label=traefik-public
        - traefik.http.routers.kuma-http.rule=Host(`${DOMAIN?Variable not set}`)
        - traefik.http.routers.kuma-http.entrypoints=http
        - traefik.http.routers.kuma-http.middlewares=https-redirect
        - traefik.http.routers.kuma-https.rule=Host(`${DOMAIN?Variable not set}`)
        - traefik.http.routers.kuma-https.entrypoints=https
        - traefik.http.routers.kuma-https.tls=true
        - traefik.http.routers.kuma-https.tls.certresolver=le
        - traefik.http.services.kuma.loadbalancer.server.port=3001

networks:
  net:
    driver: overlay
    attachable: true
  traefik-public:
    external: true

start.sh:

#Connect via SSH to your Docker Swarm manager node.
#Create an environment variable with the domain where you want to access your kuma instance, e.g.:
export DOMAIN=kuma.mydomain.com

#Make sure that your DNS records point that domain to your public IPs
#Make sure your firewall allows port 80 and 443 to (at least) one of the IPs of the Docker Swarm mode cluster.

# Deploy the app:
docker stack deploy -c kuma.yml kuma

# Below this is just information, you don't *need* any of it
echo "Access at: https://${DOMAIN}"
sleep 10 
docker stack ps kuma
sleep 10 
docker stack ps kuma
docker service logs kuma_kuma

stop.sh:

docker stack rm kuma


On your master node, cd into /mnt/pool_alpha/vm_storage/docker-compose/kuma
and run ./start.sh
Sit back and wait, and you should be able to hit that service, by name, with real certs, in a minute or two.

Go look at my git repo for more examples:

https://github.com/8layer8/swarm-public/tree/main/docker-compose

Proxied services outside of the swarm

Traefik 1.x used to be complex to hang a non-docker application onto it. Traefik 2.x basically makes it so difficult that I gave up after 2 days. The *easy* way to do this is to set up another app that is just an Nginx image with a reverse proxy config on it. This gets your app available to the outside and does all the Lets Encrypt heavy lifting for you, and traefik doesn’t fuss about it at all. Making a config is pretty straightforward on nginxconfig.io, and then your compose.yml file just maps out the files for the proxy to use. Once you set up one, you can clone it and tweak the few things that change between them.

---
version: '3.7'
services:
  openvas:
    image: nginx:latest
    environment:
      - PUID=1020
      - PGID=1000
      - TZ=America/New_York
    volumes:
      - /mnt/pool_alpha/vm_storage/proxies/openvas/nginx.conf:/etc/nginx/nginx.conf:ro
      - /mnt/pool_alpha/vm_storage/proxies/openvas/sites-enabled/openvas.8layer8.com.conf:/etc/nginx/sites-enabled/openvas.8layer8.com.conf
      - /mnt/pool_alpha/vm_storage/proxies/openvas/sites-available/openvas.8layer8.com.conf:/etc/nginx/sites-available/openvas.8layer8.com.conf
      - /mnt/pool_alpha/vm_storage/proxies/openvas/nginxconfig.io/general.conf:/etc/nginx/nginxconfig.io/general.conf
      - /mnt/pool_alpha/vm_storage/proxies/openvas/nginxconfig.io/security.conf:/etc/nginx/nginxconfig.io/security.conf
      - /mnt/pool_alpha/vm_storage/proxies/openvas/nginxconfig.io/proxy.conf:/etc/nginx/nginxconfig.io/proxy.conf
      - /mnt/pool_alpha/vm_storage/proxies/server.crt:/etc/nginx/ssl/server.crt
      - /mnt/pool_alpha/vm_storage/proxies/server.key:/etc/nginx/ssl/server.key
      - /mnt/pool_alpha/vm_storage/proxies/dhparam.pem:/etc/nginx/dhparam.pem
      #- /mnt/pool_alpha/vm_storage/proxies/openvas/static:/var/www
    networks:
      - net
      - traefik-public
    deploy:
      labels:
        - traefik.enable=true
        - traefik.docker.network=traefik-public
        - traefik.constraint-label=traefik-public
        - traefik.http.routers.openvas-proxy-http.rule=Host(`openvas.8layer8.com`)
        - traefik.http.routers.openvas-proxy-http.entrypoints=http
        - traefik.http.routers.openvas-proxy-http.middlewares=https-redirect
        - traefik.http.routers.openvas-proxy-https.rule=Host(`openvas.8layer8.com`)
        - traefik.http.routers.openvas-proxy-https.entrypoints=https
        - traefik.http.routers.openvas-proxy-https.tls=true
        - traefik.http.routers.openvas-proxy-https.tls.certresolver=le
        - traefik.http.services.openvas-proxy.loadbalancer.server.port=80
        - traefik.http.middlewares.openvas-auth.basicauth.users=brad:$$apr1$$vyr.UUVe$$iVBZogF6TZPx3LMR4BKuV1
        - traefik.http.routers.openvas-proxy-https.middlewares=openvas-auth

networks:
  net:
    driver: overlay
    attachable: true
  traefik-public:
    external: true


    
start.sh
#Connect via SSH to a Docker Swarm manager node.
docker stack deploy -c proxies.yml proxies

sleep 10 
docker stack ps proxies
sleep 10 
docker stack ps proxies
docker service logs proxies_proxy1
stop.sh
docker stack rm proxies

Add basic authentication to an app that doesn’t have any

Generate a password and escape it properly:
sudo apt install apache2-utils
echo $(htpasswd -nb brad mypassword) | sed -e s/\\$/\\$\\$/g
brad:$$apr1$$yLCU9Fxl$$V1G.kbqrTKLpXilRYkqeT/

Add/Edit to your deploy: labels: section
- traefik.http.middlewares.test-auth.basicauth.users=brad:$$apr1$$yLCU9Fxl$$V1G.kbqrTKLpXilRYkqeT/
- traefik.http.routers.catapp.middlewares=test-auth

#Redeploy your app with stop.sh and start.sh

Add apps that are docker but NOT public

INTERNAL ONLY stuff, easiest way is to just not put it on traefik and expose the port. You can hit the port on any of the hosts (due to swarm) and it will work. I left the traefik stuff in but commented so you can see the difference, it is not needed in general:

# cura.yml   internal only
version: '3.7' 
services: 
  cura: 
    image: mindcrime30/docker-cura:4.12.0 
    environment: 
      - PUID=1020
      - PGID=1020
      - TZ=America/New_York
    networks:
      - net
#      - traefik-public
    ports: 
      - 5800:5800 
    volumes: 
      - /mnt/pool_alpha/vm_storage/cura/config:/config 
      - /mnt/pool_alpha/shared:/storage 
      - /mnt/pool_alpha/vm_storage/cura/output:/output
    deploy:
      labels:
        - needs.something.to.deploy=true
        # - traefik.enable=true
        # - traefik.docker.network=traefik-public
        # - traefik.constraint-label=traefik-public
        # - traefik.http.routers.cura-http.rule=Host(`${DOMAIN?Variable not set}`)
        # - traefik.http.routers.cura-http.entrypoints=http
        # - traefik.http.routers.cura-http.middlewares=https-redirect
        # - traefik.http.routers.cura-https.rule=Host(`${DOMAIN?Variable not set}`)
        # - traefik.http.routers.cura-https.entrypoints=https
        # - traefik.http.routers.cura-https.tls=true
        # - traefik.http.routers.cura-https.tls.certresolver=le
        # - traefik.http.services.cura.loadbalancer.server.port=5800

networks:
  net:
    driver: overlay
    attachable: true
#  traefik-public:
#    external: true

Then you can just hit http://any.swarm.box.ip:5800/ and it will get you there.

Code and scripts are all here!

https://github.com/8layer8/swarm-public/tree/main

While you don’t need all of this, it does help to have something to start with.