Backup & Recovery¶

This page covers backup strategies, recovery testing, and disaster recovery procedures.

Backup Strategy¶

Two Layers of Protection¶

Layer	Protects Against	Tool
Snapshots	Accidental deletion, bad upgrades	ZFS snapshots
Backups	Disk failure, host loss	zfs send/receive

ZFS Snapshots¶

This project uses sanoid (automatic local snapshot retention) plus syncoid (incremental replication to a remote ZFS host) — both from the sanoid package. zfs-auto-snapshot is upstream-abandoned and is not used here.

Install sanoid¶

sudo apt install -y sanoid

Sanoid configuration lives at /etc/sanoid/sanoid.conf. Template-based retention is the clean pattern:

# /etc/sanoid/sanoid.conf

[template_data]
    frequently = 0
    hourly = 24
    daily = 30
    weekly = 8
    monthly = 6
    yearly = 0
    autosnap = yes
    autoprune = yes

[template_db]
    frequently = 6
    hourly = 48
    daily = 30
    weekly = 4
    monthly = 3
    autosnap = yes
    autoprune = yes

[template_disposable]
    autosnap = no
    autoprune = yes
    daily = 7

# Apply templates to datasets
[tank/nextcloud-data]
    use_template = data

[tank/db]
    use_template = db
    recursive = yes

[tank/containers]
    use_template = disposable
    recursive = yes

Sanoid's systemd timers run automatically:

systemctl status sanoid.timer
systemctl list-timers sanoid

Manual snapshots¶

Before major changes:

sudo zfs snapshot -r tank@pre-upgrade-$(date +%F)

Remote Backups¶

Replicate with syncoid¶

syncoid wraps zfs send|receive with sensible defaults (resumable transfers, incremental detection, automatic bookmarks). Set up SSH keys to the backup host first, then:

# Initial transfer (recursive)
syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data

# Subsequent incremental runs
syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data

Schedule via systemd timer or cron. Example daily cron:

# /etc/cron.daily/syncoid-replicate
#!/bin/sh
/usr/sbin/syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data
/usr/sbin/syncoid -r tank/db              backup-host:backup/db

Off-site target¶

On-site replication protects against disk failure on the primary host. It does not protect against site loss (theft, fire, full-pool corruption). Add an off-site target:

restic to B2 / S3 / Wasabi — encrypted, deduplicated, cross-platform restore. Good for the file-level layer (Nextcloud data, photos).
rclone crypt to B2 / S3 — encrypted object-storage sync, no client-side dedup but simple.
syncoid over Tailscale to a remote ZFS host (a friend's box, a VPS with attached storage) — keeps the ZFS abstraction end-to-end.

For this build, the recommended split is:

Layer	Tool	Target
Local snapshots	sanoid	`tank` itself
On-site replica	syncoid	Always-on ZFS host on LAN
Off-site (block)	syncoid over Tailscale	Remote ZFS host (e.g. friend's homelab)
Off-site (file)	restic	B2/S3 (encrypted, for Nextcloud + photos)

Database Backups¶

MariaDB Dump¶

docker exec nextcloud-db mysqldump -u root -p nextcloud > \
    /mnt/tank/backups/nextcloud-db-$(date +%Y%m%d).sql

Before Container Updates¶

Always snapshot before updating:

sudo zfs snapshot tank/db/nextcloud@pre-update
docker compose pull
docker compose up -d

Backup Verification¶

Test Restore Regularly¶

Clone snapshot to temporary dataset
Start service against clone
Verify data integrity
Destroy clone

# Clone
zfs clone tank/nextcloud-data@backup tank/test-restore

# Verify
ls /mnt/tank/test-restore

# Cleanup
zfs destroy tank/test-restore

Backup Schedule¶

Data	Snapshot Frequency	Remote Backup
nextcloud-data	Hourly	Daily
nextcloud-app	Daily	Weekly
db	Hourly	Daily
media	Weekly	Monthly
vm	Manual (pre-change)	Weekly

Recovery Testing¶

Regular recovery testing validates that backups are actually restorable.

Test Schedule¶

Test	Frequency	Duration
File restore from snapshot	Monthly	15 min
Clone dataset and verify service	Quarterly	1 hour
Full rebuild on test hardware	Yearly	4+ hours

Monthly: File Restore Test¶

Verify you can restore individual files from snapshots:

# List available snapshots
ls /mnt/tank/nextcloud-data/.zfs/snapshot/

# Pick a recent snapshot and verify file access
ls /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/

# Copy a file to verify
cp /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/test-file.txt /tmp/

# Log the test
echo "$(date): File restore test PASSED" >> /var/log/recovery-tests.log

Quarterly: Service Restore Test¶

Clone a dataset and verify the service starts:

# Stop services to avoid conflicts
cd ~/docker/nextcloud && docker compose down

# Clone the current snapshot
sudo zfs snapshot tank/nextcloud-data@restore-test
sudo zfs clone tank/nextcloud-data@restore-test tank/restore-test

# Verify data integrity
ls /mnt/tank/restore-test

# Start service with cloned data (modify compose to use clone path)
# ... verify service works ...

# Cleanup
sudo zfs destroy tank/restore-test
sudo zfs destroy tank/nextcloud-data@restore-test

# Restart production
docker compose up -d

# Log the test
echo "$(date): Quarterly service restore test PASSED" >> /var/log/recovery-tests.log

Yearly: Full Rebuild Test¶

On spare or test hardware:

Install fresh Ubuntu Server
Install ZFS and import pool (or restore from backup)
Follow Rebuild Checklist
Verify all services operational
Document any issues encountered

Test Result Logging¶

Maintain a log of all recovery tests:

# /var/log/recovery-tests.log format
# DATE: Test type - PASSED/FAILED - Notes

2024-01-15: Monthly file restore - PASSED
2024-02-15: Monthly file restore - PASSED
2024-03-01: Quarterly service clone - PASSED
2024-03-15: Monthly file restore - PASSED

Disaster Scenarios¶

Scenario 1: Single Disk Failure¶

Symptoms: ZFS pool shows degraded or faulted disk

Recovery:

Identify failed disk:
```
zpool status tank
```

If mirrored, replace the disk:

sudo zpool replace tank /dev/old-disk /dev/new-disk
sudo zpool status  # Monitor resilver

If no redundancy, restore from backup:

# Replace disk, create new pool
sudo zpool create tank /dev/new-disk

# Restore from remote backup
ssh backup-server "zfs send -R backup/tank@latest" | sudo zfs receive -F tank

Estimated Recovery Time: 1-4 hours (mirror) or 4-24 hours (full restore)

Scenario 2: Complete Host Loss¶

Symptoms: Hardware failure, cannot boot, physical damage

Recovery:

If disks are intact:
Install fresh Ubuntu on new hardware
Import existing pool: sudo zpool import tank
Follow Rebuild Checklist
If disks are lost:
Restore from offsite backup
Follow full recovery procedure below

Estimated Recovery Time: 4-8 hours (disks intact) or 24+ hours (full restore)

Scenario 3: Data Corruption Discovery¶

Symptoms: Files unreadable, checksum errors, application errors

Recovery:

Run ZFS scrub to identify extent:

sudo zpool scrub tank
zpool status tank  # Check for errors

For corrupted files, restore from snapshot:

# Find snapshot before corruption
ls /mnt/tank/data/.zfs/snapshot/

# Copy clean version
cp /mnt/tank/data/.zfs/snapshot/daily-2024-01-14/corrupted-file \
   /mnt/tank/data/corrupted-file

For widespread corruption, rollback dataset:

sudo zfs rollback tank/data@last-good-snapshot

Estimated Recovery Time: 1-4 hours

Scenario 4: Ransomware Recovery¶

Symptoms: Files encrypted, ransom notes present

Recovery:

Immediately: Disconnect from network
```
sudo ip link set eth0 down
```
Do NOT pay ransom
Assess damage:
Which datasets are affected?
When did encryption start?

Rollback to pre-infection snapshot:

# Find clean snapshot (before infection date)
zfs list -t snapshot -o name,creation

# Rollback affected datasets
sudo zfs rollback tank/data@pre-infection

Before reconnecting:
Investigate infection vector
Patch vulnerabilities
Change all credentials

Estimated Recovery Time: 4-24 hours

Recovery Runbooks¶

Runbook: Nextcloud Restore¶

Prerequisites: SSH access to server, ZFS snapshots available

Stop Nextcloud:

cd ~/docker/nextcloud
docker compose down

Rollback datasets:

sudo zfs rollback tank/nextcloud-data@backup
sudo zfs rollback tank/nextcloud-app@backup
sudo zfs rollback tank/db/nextcloud@backup

Start Nextcloud:
```
docker compose up -d
```
Verify:
Access web UI
Check file listing
Verify recent files exist

Runbook: Database Restore from Dump¶

Prerequisites: SQL dump file available

Stop application:
```
docker compose stop nextcloud
```

Restore database:

docker exec -i nextcloud-db mysql -u root -p"${MYSQL_ROOT_PASSWORD}" nextcloud < backup.sql

Restart application:
```
docker compose start nextcloud
```

Run maintenance:

docker exec -u www-data nextcloud php occ maintenance:repair
docker exec -u www-data nextcloud php occ files:scan --all

Runbook: Full System Recovery¶

Prerequisites: Backup server accessible, new hardware ready

Install Ubuntu Server (minimal)

Install ZFS:

sudo apt update && sudo apt install -y zfsutils-linux

Create pool (if restoring from backup):
```
sudo zpool create tank /dev/nvme0n1p3
```

Restore data:

# Full recursive restore
ssh backup-server "zfs send -R backup/tank@latest" | sudo zfs receive -F tank

Follow Rebuild Checklist for:
Docker installation
Container configuration
Network setup
Service verification

Required Access and Credentials¶

Keep secure, offline copies of:

Item	Location
SSH keys	Password manager, printed
Root/admin passwords	Password manager
Backup server credentials	Password manager
Docker .env files	In ZFS snapshot, password manager
BIOS password	Printed, secure location
Encryption keys (if applicable)	Multiple secure locations

Credential Storage

Recovery is impossible without access credentials. Store them in multiple secure locations.