Skip to content

Backup & Recovery

This page covers backup strategies, recovery testing, and disaster recovery procedures.

Backup Strategy

Two Layers of Protection

Layer Protects Against Tool
Snapshots Accidental deletion, bad upgrades ZFS snapshots
Backups Disk failure, host loss zfs send/receive

ZFS Snapshots

This project uses sanoid (automatic local snapshot retention) plus syncoid (incremental replication to a remote ZFS host) — both from the sanoid package. zfs-auto-snapshot is upstream-abandoned and is not used here.

Install sanoid

sudo apt install -y sanoid

Sanoid configuration lives at /etc/sanoid/sanoid.conf. Template-based retention is the clean pattern:

# /etc/sanoid/sanoid.conf

[template_data]
    frequently = 0
    hourly = 24
    daily = 30
    weekly = 8
    monthly = 6
    yearly = 0
    autosnap = yes
    autoprune = yes

[template_db]
    frequently = 6
    hourly = 48
    daily = 30
    weekly = 4
    monthly = 3
    autosnap = yes
    autoprune = yes

[template_disposable]
    autosnap = no
    autoprune = yes
    daily = 7

# Apply templates to datasets
[tank/nextcloud-data]
    use_template = data

[tank/db]
    use_template = db
    recursive = yes

[tank/containers]
    use_template = disposable
    recursive = yes

Sanoid's systemd timers run automatically:

systemctl status sanoid.timer
systemctl list-timers sanoid

Manual snapshots

Before major changes:

sudo zfs snapshot -r tank@pre-upgrade-$(date +%F)

Remote Backups

Replicate with syncoid

syncoid wraps zfs send|receive with sensible defaults (resumable transfers, incremental detection, automatic bookmarks). Set up SSH keys to the backup host first, then:

# Initial transfer (recursive)
syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data

# Subsequent incremental runs
syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data

Schedule via systemd timer or cron. Example daily cron:

# /etc/cron.daily/syncoid-replicate
#!/bin/sh
/usr/sbin/syncoid -r tank/nextcloud-data backup-host:backup/nextcloud-data
/usr/sbin/syncoid -r tank/db              backup-host:backup/db

Off-site target

On-site replication protects against disk failure on the primary host. It does not protect against site loss (theft, fire, full-pool corruption). Add an off-site target:

  • restic to B2 / S3 / Wasabi — encrypted, deduplicated, cross-platform restore. Good for the file-level layer (Nextcloud data, photos).
  • rclone crypt to B2 / S3 — encrypted object-storage sync, no client-side dedup but simple.
  • syncoid over Tailscale to a remote ZFS host (a friend's box, a VPS with attached storage) — keeps the ZFS abstraction end-to-end.

For this build, the recommended split is:

Layer Tool Target
Local snapshots sanoid tank itself
On-site replica syncoid Always-on ZFS host on LAN
Off-site (block) syncoid over Tailscale Remote ZFS host (e.g. friend's homelab)
Off-site (file) restic B2/S3 (encrypted, for Nextcloud + photos)

Database Backups

MariaDB Dump

docker exec nextcloud-db mysqldump -u root -p nextcloud > \
    /mnt/tank/backups/nextcloud-db-$(date +%Y%m%d).sql

Before Container Updates

Always snapshot before updating:

sudo zfs snapshot tank/db/nextcloud@pre-update
docker compose pull
docker compose up -d

Backup Verification

Test Restore Regularly

  1. Clone snapshot to temporary dataset
  2. Start service against clone
  3. Verify data integrity
  4. Destroy clone
# Clone
zfs clone tank/nextcloud-data@backup tank/test-restore

# Verify
ls /mnt/tank/test-restore

# Cleanup
zfs destroy tank/test-restore

Backup Schedule

Data Snapshot Frequency Remote Backup
nextcloud-data Hourly Daily
nextcloud-app Daily Weekly
db Hourly Daily
media Weekly Monthly
vm Manual (pre-change) Weekly

Recovery Testing

Regular recovery testing validates that backups are actually restorable.

Test Schedule

Test Frequency Duration
File restore from snapshot Monthly 15 min
Clone dataset and verify service Quarterly 1 hour
Full rebuild on test hardware Yearly 4+ hours

Monthly: File Restore Test

Verify you can restore individual files from snapshots:

# List available snapshots
ls /mnt/tank/nextcloud-data/.zfs/snapshot/

# Pick a recent snapshot and verify file access
ls /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/

# Copy a file to verify
cp /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/test-file.txt /tmp/

# Log the test
echo "$(date): File restore test PASSED" >> /var/log/recovery-tests.log

Quarterly: Service Restore Test

Clone a dataset and verify the service starts:

# Stop services to avoid conflicts
cd ~/docker/nextcloud && docker compose down

# Clone the current snapshot
sudo zfs snapshot tank/nextcloud-data@restore-test
sudo zfs clone tank/nextcloud-data@restore-test tank/restore-test

# Verify data integrity
ls /mnt/tank/restore-test

# Start service with cloned data (modify compose to use clone path)
# ... verify service works ...

# Cleanup
sudo zfs destroy tank/restore-test
sudo zfs destroy tank/nextcloud-data@restore-test

# Restart production
docker compose up -d

# Log the test
echo "$(date): Quarterly service restore test PASSED" >> /var/log/recovery-tests.log

Yearly: Full Rebuild Test

On spare or test hardware:

  1. Install fresh Ubuntu Server
  2. Install ZFS and import pool (or restore from backup)
  3. Follow Rebuild Checklist
  4. Verify all services operational
  5. Document any issues encountered

Test Result Logging

Maintain a log of all recovery tests:

# /var/log/recovery-tests.log format
# DATE: Test type - PASSED/FAILED - Notes

2024-01-15: Monthly file restore - PASSED
2024-02-15: Monthly file restore - PASSED
2024-03-01: Quarterly service clone - PASSED
2024-03-15: Monthly file restore - PASSED

Disaster Scenarios

Scenario 1: Single Disk Failure

Symptoms: ZFS pool shows degraded or faulted disk

Recovery:

  1. Identify failed disk:

    zpool status tank
    

  2. If mirrored, replace the disk:

    sudo zpool replace tank /dev/old-disk /dev/new-disk
    sudo zpool status  # Monitor resilver
    

  3. If no redundancy, restore from backup:

    # Replace disk, create new pool
    sudo zpool create tank /dev/new-disk
    
    # Restore from remote backup
    ssh backup-server "zfs send -R backup/tank@latest" | sudo zfs receive -F tank
    

Estimated Recovery Time: 1-4 hours (mirror) or 4-24 hours (full restore)

Scenario 2: Complete Host Loss

Symptoms: Hardware failure, cannot boot, physical damage

Recovery:

  1. If disks are intact:
  2. Install fresh Ubuntu on new hardware
  3. Import existing pool: sudo zpool import tank
  4. Follow Rebuild Checklist

  5. If disks are lost:

  6. Restore from offsite backup
  7. Follow full recovery procedure below

Estimated Recovery Time: 4-8 hours (disks intact) or 24+ hours (full restore)

Scenario 3: Data Corruption Discovery

Symptoms: Files unreadable, checksum errors, application errors

Recovery:

  1. Run ZFS scrub to identify extent:

    sudo zpool scrub tank
    zpool status tank  # Check for errors
    

  2. For corrupted files, restore from snapshot:

    # Find snapshot before corruption
    ls /mnt/tank/data/.zfs/snapshot/
    
    # Copy clean version
    cp /mnt/tank/data/.zfs/snapshot/daily-2024-01-14/corrupted-file \
       /mnt/tank/data/corrupted-file
    

  3. For widespread corruption, rollback dataset:

    sudo zfs rollback tank/data@last-good-snapshot
    

Estimated Recovery Time: 1-4 hours

Scenario 4: Ransomware Recovery

Symptoms: Files encrypted, ransom notes present

Recovery:

  1. Immediately: Disconnect from network

    sudo ip link set eth0 down
    

  2. Do NOT pay ransom

  3. Assess damage:

  4. Which datasets are affected?
  5. When did encryption start?

  6. Rollback to pre-infection snapshot:

    # Find clean snapshot (before infection date)
    zfs list -t snapshot -o name,creation
    
    # Rollback affected datasets
    sudo zfs rollback tank/data@pre-infection
    

  7. Before reconnecting:

  8. Investigate infection vector
  9. Patch vulnerabilities
  10. Change all credentials

Estimated Recovery Time: 4-24 hours

Recovery Runbooks

Runbook: Nextcloud Restore

Prerequisites: SSH access to server, ZFS snapshots available

  1. Stop Nextcloud:

    cd ~/docker/nextcloud
    docker compose down
    

  2. Rollback datasets:

    sudo zfs rollback tank/nextcloud-data@backup
    sudo zfs rollback tank/nextcloud-app@backup
    sudo zfs rollback tank/db/nextcloud@backup
    

  3. Start Nextcloud:

    docker compose up -d
    

  4. Verify:

  5. Access web UI
  6. Check file listing
  7. Verify recent files exist

Runbook: Database Restore from Dump

Prerequisites: SQL dump file available

  1. Stop application:

    docker compose stop nextcloud
    

  2. Restore database:

    docker exec -i nextcloud-db mysql -u root -p"${MYSQL_ROOT_PASSWORD}" nextcloud < backup.sql
    

  3. Restart application:

    docker compose start nextcloud
    

  4. Run maintenance:

    docker exec -u www-data nextcloud php occ maintenance:repair
    docker exec -u www-data nextcloud php occ files:scan --all
    

Runbook: Full System Recovery

Prerequisites: Backup server accessible, new hardware ready

  1. Install Ubuntu Server (minimal)

  2. Install ZFS:

    sudo apt update && sudo apt install -y zfsutils-linux
    

  3. Create pool (if restoring from backup):

    sudo zpool create tank /dev/nvme0n1p3
    

  4. Restore data:

    # Full recursive restore
    ssh backup-server "zfs send -R backup/tank@latest" | sudo zfs receive -F tank
    

  5. Follow Rebuild Checklist for:

  6. Docker installation
  7. Container configuration
  8. Network setup
  9. Service verification

Required Access and Credentials

Keep secure, offline copies of:

Item Location
SSH keys Password manager, printed
Root/admin passwords Password manager
Backup server credentials Password manager
Docker .env files In ZFS snapshot, password manager
BIOS password Printed, secure location
Encryption keys (if applicable) Multiple secure locations

Credential Storage

Recovery is impossible without access credentials. Store them in multiple secure locations.