Integration with this build¶
How Ansible fits with the VirtualBox/Multipass lab in scripts/lab/ and how the same playbooks will run against the real MS-S1 MAX once it's provisioned.
Two halves of automation¶
+--------------------------------------------+
| Python + VBoxManage / Multipass |
| Provisions: VM exists, has disks, has SSH |
+--------------------------------------------+
|
| SSH (with cloud-init pubkey)
v
+--------------------------------------------+
| Ansible playbooks |
| Configures: packages, ssh, ufw, zfs, |
| docker, services |
+--------------------------------------------+
The split is deliberate:
- Python's strengths: ISO download, VM lifecycle, snapshotting, structured state. Things you'd reach for
bashfor, but Python is cleaner. - Ansible's strengths: idempotent in-host configuration. Whether you ran the play once or a hundred times, the host ends in the same state.
The boundary is "the host accepts SSH". Below that line: Python. Above: Ansible.
The lab path, end to end¶
# 0. once: install prerequisites on the control node (your Mac)
brew install ansible # or pipx install ansible
brew install --cask multipass # for Apple Silicon
# OR for x86_64:
# brew install --cask virtualbox
# 1. provision the lab VM
# Apple Silicon: use Multipass
python3 scripts/lab/01_provision_multipass.py
# x86_64: use VirtualBox
python3 scripts/lab/01_provision.py
# 2. apply playbooks (default chain: bootstrap, ssh-hardening, ufw)
python3 scripts/lab/02_apply.py
# 3. apply individual playbooks for the ZFS / Docker exercises
python3 scripts/lab/02_apply.py zfs
python3 scripts/lab/02_apply.py zfs -e topology=mirror
# 4. snapshot before risky changes
multipass snapshot ms-s1-max-lab --name pre-zfs-experiment
# or for VirtualBox:
# VBoxManage snapshot ms-s1-max-lab take pre-zfs-experiment --pause
# 5. tear down when done
multipass delete --purge ms-s1-max-lab
# or:
# VBoxManage unregistervm ms-s1-max-lab --delete
What lives where¶
scripts/lab/
_config.py # LabConfig dataclass (env-driven defaults)
_vbox.py # VBoxManage subprocess wrapper
_multipass.py # Multipass subprocess wrapper
_iso.py # Ubuntu ISO download + SHA256 verify
_ssh.py # SSH wait + key push helpers
_state.py # JSON state for "this phase already ran"
01_provision.py # VirtualBox provisioner
01_provision_multipass.py # Multipass provisioner
02_apply.py # Run ansible-playbook against the VM
ansible/
ansible.cfg # lab-tuned defaults
requirements.yml # ansible-galaxy collections
playbooks/
bootstrap.yml # apt baseline, timezone, sudoers, journald
ssh-hardening.yml # matches docs/ssh/server/hardening.md
ufw.yml # default-deny + OpenSSH
zfs.yml # install ZFS, ARC cap, pool + datasets
docker.yml # (planned)
services.yml # (planned)
State is JSON in target/<vm-name>-state.json. Each phase marks itself complete; re-running is safe.
How Multipass vs VirtualBox differ in this design¶
| Concern | VirtualBox | Multipass |
|---|---|---|
| Where it runs well | x86_64 Macs, Linux, Windows | Apple Silicon Macs, x86_64 Macs, Linux |
| Provisioning | 01_provision.py (full unattended via VBoxManage) | 01_provision_multipass.py (uses cloud-init) |
| Multiple lab disks | 6 separate virtual disks | 1 disk; ZFS playbook makes loopback files |
| SSH endpoint | host:2222 via NAT port-forward | VM's IP on the LAN-like Multipass network |
| Snapshots | VBoxManage snapshot take | multipass snapshot |
| Headless | yes (--type headless) | yes (default) |
The Ansible side is identical — 02_apply.py reads _state.json to figure out which endpoint to put in the inventory, and the playbooks themselves don't care which provisioner ran first. zfs.yml does adapt (loopback files instead of /dev/sd*) when no real lab disks are present.
How this maps to the real MS-S1 MAX¶
After installing Ubuntu Server 26.04 on the actual MS-S1 MAX (manually, since that's how the bare-metal install works):
# scripts/lab/ansible/inventory.yml (or move out of lab/ for production)
all:
children:
production:
hosts:
ms-s1-max:
ansible_host: 192.168.1.10 # or its Tailscale name
ansible_user: morten
ansible_become: true
# no password — passwordless sudo configured by bootstrap.yml
Then:
# Push your SSH key (one-time)
ssh-copy-id morten@ms-s1-max
# Apply the same playbooks the lab uses
cd scripts/lab/ansible
ansible-playbook -i inventory.yml playbooks/bootstrap.yml -l production
ansible-playbook -i inventory.yml playbooks/ssh-hardening.yml -l production
ansible-playbook -i inventory.yml playbooks/ufw.yml -l production
ansible-playbook -i inventory.yml playbooks/zfs.yml -l production \
-e topology=stripe # production layout: single stripe over 2 disks
# Then Docker, services...
Same playbooks. Different inventory. Different host. That's the point.
Where the playbooks intentionally don't cover¶
This is what scripts/lab/ansible/playbooks/ deliberately does NOT do:
- Bare-metal Ubuntu install. That's a one-time manual step on the real hardware (you boot from a USB, click through Subiquity). The lab automation simulates it but doesn't replace it.
- GPU / ROCm install. Strix-Halo-specific kernel/firmware shenanigans don't fit Ansible's model cleanly. See
docs/ai/gpu/rocm-installation.mdfor the manual procedure (which can be wrapped in Ansible later if you want). - Filesystem-level disaster recovery. When the pool fails, Ansible can't help. See
docs/zfs/troubleshooting.md.
The boundaries are deliberate. Ansible does what it's good at; other tools do the rest.
Secrets handling end-to-end¶
In the lab the password is changeme from _config.py. For production:
# 1. Set a real password and store in 1Password (not in plaintext anywhere)
op item create --category=password --title='ms-s1-max sudo' password='real-secret'
# 2. Vault file references it (committed to git, encrypted)
ansible-vault create scripts/lab/ansible/group_vars/production/vault.yml
# Contents:
# vault_become_password: "{{ lookup('community.general.onepassword', 'ms-s1-max sudo') }}"
# 3. main.yml in group_vars/production/ pulls the value
# scripts/lab/ansible/group_vars/production/main.yml:
# ansible_become_password: "{{ vault_become_password }}"
# 4. Ansible vault password itself comes from 1Password too
echo '#!/bin/bash
op read "op://Private/ansible-vault/password"' > ~/.bin/ansible-vault-pass
chmod +x ~/.bin/ansible-vault-pass
# 5. Ansible looks it all up at runtime — no plaintext on disk
ansible-playbook -i inventory.yml playbook.yml \
--vault-password-file ~/.bin/ansible-vault-pass
See Vault for the full pattern.
Why not Terraform?¶
Terraform's strength is "I describe cloud resources; here's their state". For a single mini-PC running Ubuntu, Terraform's state-file overhead doesn't pay back. Provisioning a VM with Python + VBoxManage or Multipass is simpler and direct. If you ever scale to "10 boxes in AWS plus the homelab", Terraform makes sense for the cloud half — at that point pair it with Ansible as I'm doing here.
Why not Salt / Chef / Puppet?¶
Ansible's agentless SSH model is a much better fit for a homelab than any of those. Salt-master, Chef-server, etc. are designed for larger fleets and assume an agent on every node, which is overkill for a few hosts.
A few more useful commands¶
# Run a playbook against ALL hosts (lab + production)
ansible-playbook -i inventory.yml playbook.yml
# Limit to specific hosts
ansible-playbook -i inventory.yml playbook.yml -l production
ansible-playbook -i inventory.yml playbook.yml -l 'lab:!ms-s1-max-lab-old'
# Tag-based partial runs
ansible-playbook -i inventory.yml playbook.yml --tags ssh
ansible-playbook -i inventory.yml playbook.yml --skip-tags slow
# See what would change
ansible-playbook -i inventory.yml playbook.yml --check --diff
# Time how long each task takes
ANSIBLE_CALLBACKS_ENABLED=profile_tasks ansible-playbook -i inventory.yml playbook.yml
Where to go next¶
- Connection — SSH and become specifics.
- Vault — full secrets workflow.
- Troubleshooting — when something doesn't work.
- The actual code:
scripts/lab/in the repo.