Backup & Recovery¶
This page covers backup strategies, recovery testing, and disaster recovery procedures.
Backup Strategy¶
Two Layers of Protection¶
| Layer | Protects Against | Tool |
|---|---|---|
| Snapshots | Accidental deletion, bad upgrades | ZFS snapshots |
| Backups | Disk failure, host loss | zfs send/receive |
ZFS Snapshots¶
Automated Snapshots¶
Install zfs-auto-snapshot:
Default retention:
- Frequent: 4 (every 15 min)
- Hourly: 24
- Daily: 31
- Weekly: 8
- Monthly: 12
Disable for Disposable Data¶
Manual Snapshots¶
Before major changes:
Remote Backups¶
Send to Remote Server¶
Initial full send:
Incremental sends:
zfs send -i @previous @latest tank/nextcloud-data | \
ssh backup-server zfs receive backup/nextcloud-data
Automated with Sanoid/Syncoid¶
Install sanoid:
Configure /etc/sanoid/sanoid.conf:
[tank/nextcloud-data]
use_template = production
[template_production]
frequently = 0
hourly = 24
daily = 30
monthly = 6
yearly = 0
autosnap = yes
autoprune = yes
Run syncoid for replication:
Database Backups¶
MariaDB Dump¶
docker exec nextcloud-db mysqldump -u root -p nextcloud > \
/mnt/tank/backups/nextcloud-db-$(date +%Y%m%d).sql
Before Container Updates¶
Always snapshot before updating:
Backup Verification¶
Test Restore Regularly¶
- Clone snapshot to temporary dataset
- Start service against clone
- Verify data integrity
- Destroy clone
# Clone
zfs clone tank/nextcloud-data@backup tank/test-restore
# Verify
ls /mnt/tank/test-restore
# Cleanup
zfs destroy tank/test-restore
Backup Schedule¶
| Data | Snapshot Frequency | Remote Backup |
|---|---|---|
| nextcloud-data | Hourly | Daily |
| nextcloud-app | Daily | Weekly |
| db | Hourly | Daily |
| media | Weekly | Monthly |
| vm | Manual (pre-change) | Weekly |
Recovery Testing¶
Regular recovery testing validates that backups are actually restorable.
Test Schedule¶
| Test | Frequency | Duration |
|---|---|---|
| File restore from snapshot | Monthly | 15 min |
| Clone dataset and verify service | Quarterly | 1 hour |
| Full rebuild on test hardware | Yearly | 4+ hours |
Monthly: File Restore Test¶
Verify you can restore individual files from snapshots:
# List available snapshots
ls /mnt/tank/nextcloud-data/.zfs/snapshot/
# Pick a recent snapshot and verify file access
ls /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/
# Copy a file to verify
cp /mnt/tank/nextcloud-data/.zfs/snapshot/hourly-$(date +%Y-%m-%d)/test-file.txt /tmp/
# Log the test
echo "$(date): File restore test PASSED" >> /var/log/recovery-tests.log
Quarterly: Service Restore Test¶
Clone a dataset and verify the service starts:
# Stop services to avoid conflicts
cd ~/docker/nextcloud && docker compose down
# Clone the current snapshot
sudo zfs snapshot tank/nextcloud-data@restore-test
sudo zfs clone tank/nextcloud-data@restore-test tank/restore-test
# Verify data integrity
ls /mnt/tank/restore-test
# Start service with cloned data (modify compose to use clone path)
# ... verify service works ...
# Cleanup
sudo zfs destroy tank/restore-test
sudo zfs destroy tank/nextcloud-data@restore-test
# Restart production
docker compose up -d
# Log the test
echo "$(date): Quarterly service restore test PASSED" >> /var/log/recovery-tests.log
Yearly: Full Rebuild Test¶
On spare or test hardware:
- Install fresh Ubuntu Server
- Install ZFS and import pool (or restore from backup)
- Follow Rebuild Checklist
- Verify all services operational
- Document any issues encountered
Test Result Logging¶
Maintain a log of all recovery tests:
# /var/log/recovery-tests.log format
# DATE: Test type - PASSED/FAILED - Notes
2024-01-15: Monthly file restore - PASSED
2024-02-15: Monthly file restore - PASSED
2024-03-01: Quarterly service clone - PASSED
2024-03-15: Monthly file restore - PASSED
Disaster Scenarios¶
Scenario 1: Single Disk Failure¶
Symptoms: ZFS pool shows degraded or faulted disk
Recovery:
-
Identify failed disk:
-
If mirrored, replace the disk:
-
If no redundancy, restore from backup:
Estimated Recovery Time: 1-4 hours (mirror) or 4-24 hours (full restore)
Scenario 2: Complete Host Loss¶
Symptoms: Hardware failure, cannot boot, physical damage
Recovery:
- If disks are intact:
- Install fresh Ubuntu on new hardware
- Import existing pool:
sudo zpool import tank -
Follow Rebuild Checklist
-
If disks are lost:
- Restore from offsite backup
- Follow full recovery procedure below
Estimated Recovery Time: 4-8 hours (disks intact) or 24+ hours (full restore)
Scenario 3: Data Corruption Discovery¶
Symptoms: Files unreadable, checksum errors, application errors
Recovery:
-
Run ZFS scrub to identify extent:
-
For corrupted files, restore from snapshot:
-
For widespread corruption, rollback dataset:
Estimated Recovery Time: 1-4 hours
Scenario 4: Ransomware Recovery¶
Symptoms: Files encrypted, ransom notes present
Recovery:
-
Immediately: Disconnect from network
-
Do NOT pay ransom
-
Assess damage:
- Which datasets are affected?
-
When did encryption start?
-
Rollback to pre-infection snapshot:
-
Before reconnecting:
- Investigate infection vector
- Patch vulnerabilities
- Change all credentials
Estimated Recovery Time: 4-24 hours
Recovery Runbooks¶
Runbook: Nextcloud Restore¶
Prerequisites: SSH access to server, ZFS snapshots available
-
Stop Nextcloud:
-
Rollback datasets:
-
Start Nextcloud:
-
Verify:
- Access web UI
- Check file listing
- Verify recent files exist
Runbook: Database Restore from Dump¶
Prerequisites: SQL dump file available
-
Stop application:
-
Restore database:
-
Restart application:
-
Run maintenance:
Runbook: Full System Recovery¶
Prerequisites: Backup server accessible, new hardware ready
-
Install Ubuntu Server (minimal)
-
Install ZFS:
-
Create pool (if restoring from backup):
-
Restore data:
-
Follow Rebuild Checklist for:
- Docker installation
- Container configuration
- Network setup
- Service verification
Required Access and Credentials¶
Keep secure, offline copies of:
| Item | Location |
|---|---|
| SSH keys | Password manager, printed |
| Root/admin passwords | Password manager |
| Backup server credentials | Password manager |
| Docker .env files | In ZFS snapshot, password manager |
| BIOS password | Printed, secure location |
| Encryption keys (if applicable) | Multiple secure locations |
Credential Storage
Recovery is impossible without access credentials. Store them in multiple secure locations.