Pool Creation¶

Turning identified, prepped devices into a working pool. This page covers the actual zpool create for the MS-S1 MAX build, plus the design choices behind every flag.

Prerequisites¶

ZFS userland installed (sudo apt install -y zfsutils-linux).
Devices identified and prepped per Partitioning: primary's 4th partition is empty, secondary disk is wiped clean.
ARC cap planned (see below).
Decision made on encryption (see Encryption) — easier to set at create time than to retrofit.

The actual `zpool create` for this build¶

PRIMARY_PART=/dev/disk/by-id/nvme-Samsung_SSD_990_PRO_2TB_<serial>-part4
SECONDARY=/dev/disk/by-id/nvme-Samsung_SSD_990_PRO_4TB_<serial>

sudo zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -O compression=lz4 \
    -O atime=off \
    -O xattr=sa \
    -O acltype=posixacl \
    -O dnodesize=auto \
    -O mountpoint=/mnt/tank \
    tank \
    "$PRIMARY_PART" \
    "$SECONDARY"

That's it. One pool, two single-disk top-level vdevs (effectively a stripe), with sensible defaults set at create time so they're inherited by all child datasets.

Verify:

sudo zpool status tank
sudo zpool list tank
zfs get compression,recordsize,atime,xattr,acltype tank

The rest of this page is "why those flags, what to do differently if your situation differs".

Pool-level options (`-o`)¶

These attach to the pool, not to datasets. Set at create time; many can be changed later but some can't.

`ashift=12` — locked at create time¶

The log2 of the smallest IO unit ZFS will use to a vdev. ashift=12 = 4 KiB blocks, the right setting for any 512e / 4Kn / NVMe disk you'll encounter today.

Cannot be changed. Set wrong -> recreate the pool. Always set it explicitly even though current OpenZFS picks 12 by default — being explicit means you'll spot mistakes faster.

`autotrim=on`¶

Periodic background TRIM on SSDs and NVMe — tells the underlying device which blocks are free so its garbage collection can run efficiently.

on: ZFS issues TRIM requests in the background as space is freed. Light overhead.
off: explicit manual TRIM via zpool trim tank only.

On NVMe, autotrim=on is the right answer for this build. On HDD-only pools, it doesn't do anything.

`autoreplace=on` (optional)¶

When set, ZFS automatically uses a spare disk to replace a failed pool member. Requires spare vdevs (zpool add tank spare …). Not relevant here (no spares on a 2-disk box).

`autoexpand=on` (optional)¶

When a pool's underlying device(s) grow (e.g. you replaced a 4 TB disk with an 8 TB disk in a mirror), autoexpand=on makes the pool see the new capacity automatically. Without it, you'd run zpool online -e tank <device> manually after the resilver.

`cachefile=/etc/zfs/zpool.cache` (default)¶

Where ZFS records the list of imported pools. Stored on the boot disk so systemd's zfs-import-cache.service can re-import the pool at boot. Don't override unless you're doing root-on-ZFS or using zfs-import-scan.service instead.

Filesystem-level options (`-O`) — inherited by every dataset¶

These set the defaults for the root dataset (tank); child datasets inherit unless overridden. Setting them at create time avoids having to remember to set them later.

`compression=lz4`¶

See Concepts -> Compression. lz4 is the right default; safe everywhere, cheap on Zen 5. Override per-dataset only when you know better (e.g. compression=off for tank/ai where GGUF files are already compressed; compression=zstd-3 for tank/backups).

`atime=off`¶

atime=on (the default for compatibility) updates the access timestamp on every file read, which generates a write for every read. Turning it off saves a lot of IOPS for very little practical loss — atime is used by almost nothing in practice. If you have a tool that needs it (some mail-spool implementations), set relatime=on instead, which updates atime only when it would otherwise have been older than mtime.

`xattr=sa`¶

Where extended attributes are stored:

xattr=on (legacy default): stored in a hidden directory entry per file. Two seeks per xattr access.
xattr=sa (recommended): stored in the inode (system attribute). One seek. Better performance.

xattr=sa is what every modern guide recommends; it's only "not the default" for historical compatibility.

`acltype=posixacl`¶

Enables POSIX ACLs (setfacl/getfacl). Off by default for historical reasons. Turn it on — many services (Samba, container runtimes) expect them.

`dnodesize=auto`¶

The size of a directory-node (inode equivalent). auto lets ZFS choose larger dnodes when needed for things like SA-stored xattrs. Pairs naturally with xattr=sa.

`mountpoint=/mnt/tank`¶

Where the pool's root dataset is mounted. /mnt/tank is the convention in this build (per Disk Partitioning and START.md). Alternatives:

mountpoint=/tank — keeps paths shorter (/tank/media vs /mnt/tank/media); some prefer this.
mountpoint=none — pool root is unmounted; you mount only specific datasets. Useful for "pool of zvols" setups.
mountpoint=legacy — don't auto-mount; use /etc/fstab. Mostly for root-on-ZFS.

If you change the convention to /tank, sweep the rest of the docs (every /mnt/tank/... path).

Options to consider per-dataset, not pool-wide¶

These are commonly not set at the pool level because they're workload-specific:

recordsize — block size; default 128 K. Tune per dataset.
sync — sync write behaviour. Default is fine for the pool root.
primarycache / secondarycache — what ARC/L2ARC caches for this dataset.
redundant_metadata=most (default all) — keep all metadata triple-copied; most keeps fewer copies of some metadata for ~5-10% space savings. Worth setting on truly bulk datasets if you want it.

See Datasets for the full property reference.

Cap the ARC size¶

The Adaptive Replacement Cache defaults to ~50% of system RAM. On a 128 GB box that's 64 GB silently consumed by cache — which collides with the memory budget for VMs and Ollama/llama.cpp.

Set a hard cap before the pool gets heavy use:

# /etc/modprobe.d/zfs.conf
echo 'options zfs zfs_arc_max=17179869184' | sudo tee /etc/modprobe.d/zfs.conf

# Apply on next boot (initramfs needs updating because zfs is in there)
sudo update-initramfs -u

# Set it live without rebooting (matches modprobe.d on next boot)
echo 17179869184 | sudo tee /sys/module/zfs/parameters/zfs_arc_max
cat /sys/module/zfs/parameters/zfs_arc_max

Sane caps for this hardware:

Workload mix	ARC cap	Bytes
Heavy LLM (Ollama / llama.cpp hot, minimal disk I/O on tank)	8 GiB	`8589934592`
Mixed: VMs + AI + services (recommended default)	16 GiB	`17179869184`
Mostly cold storage, ZFS-heavy reads, modest VM/LLM	32 GiB	`34359738368`

You can also set a minimum (zfs_arc_min) and force a more aggressive shrink target if you observe lots of memory pressure under inference. Most users don't need to.

See Tuning for ARC internals (primarycache, prefetch tuning, arc_meta_limit).

Verify the pool¶

sudo zpool status -v tank
sudo zpool list -v tank
zfs list -o name,used,available,referenced,mountpoint
zfs get all tank | head -40

You should see:

pool state ONLINE
two leaf devices, both ONLINE with no errors
ARC stats sane (cat /proc/spl/kstat/zfs/arcstats | grep '^size')
pool mounted at /mnt/tank

Pool features (compatibility level)¶

ZFS pools have features that you can enable. They're additive: enabling a feature on a pool means the on-disk format starts using it, and the pool can no longer be imported by ZFS implementations that don't know that feature.

sudo zpool get all tank | grep feature@

Each row is feature@<name> <state>, where state is disabled, enabled, or active. New pools have most features enabled (announced but not yet used) and start using them as datasets/data are created.

For a single-host pool that you're not moving between ZFS implementations, leave them at the defaults. If you're planning to send/receive across implementations (e.g. to a FreeBSD backup host), check compatibility profiles:

ls /etc/zfs/compatibility.d/
sudo zpool create -o compatibility=openzfs-2.1-linux ...

Common pitfalls:

A pool created on the newest OpenZFS can't always be imported on an older one. Check the version and feature flags before moving pools.
A pool with crypto features active can't be imported by a build without encryption support.

Pool features worth knowing about¶

Feature	What it does	When you care
`async_destroy`	Async dataset destroy in the background	Default; harmless.
`large_blocks`	Allow recordsize up to 1 MiB	You set `recordsize=1M`.
`large_dnode`	Allow larger dnodes (matches `dnodesize=auto`)	Pairs with `xattr=sa`.
`lz4_compress`, `zstd_compress`	The compression algorithms	Activated when you set the property.
`encryption`	Native dataset encryption	See Encryption.
`raidz_expansion` (2.3+)	Add a disk to an existing raidz	Useful if you ever go raidz.
`head_errlog`	Better tracking of corrupted files across snapshots	Default in 2.2+.
`device_removal`	Allow removing a top-level vdev (non-raidz)	Lets you shrink a pool.

Stripping vs adding redundancy after the fact¶

You can attach a mirror partner to a single-disk vdev:

sudo zpool attach tank "$PRIMARY_PART" /dev/disk/by-id/<new-disk>

This converts the single-disk vdev into a 2-way mirror. The new disk resilvers from the existing one.

You cannot turn an existing stripe of two top-level vdevs into a single mirror — that's a different topology. The closest you can do is zpool replace plus juggling, which is risky and not worth it on a tiny home pool.

Setting the pool to auto-import at boot¶

Ubuntu's zfs-import-cache.service + zfs-mount.service units handle this automatically once the pool is created. Verify:

systemctl status zfs-import-cache.service zfs-mount.service zfs.target
sudo zpool export tank && sudo zpool import tank   # round-trip to confirm the import works

If you ever boot and the pool isn't imported, see Troubleshooting -> Pool Import Failures.

What this pool is not¶

It's worth re-stating the trade-offs explicitly:

Not redundant. A failed primary partition or the secondary NVMe = a dead pool. Mitigated by snapshots + off-host replication.
Asymmetric IO. The 4 TB drive is on a PCIe 4.0 x1 link (~2 GB/s ceiling). For VM disks and hot databases, prefer the partition on the primary drive (PCIe 4.0 x4). See Datasets for per-dataset placement hints.
Two top-level vdevs of different sizes. ZFS will allocate proportionally; the larger device will see more writes. Fine for this workload but worth knowing.

What to do next¶

Encryption — set this up before creating sensitive datasets if you want it.
Datasets — carve up the pool into per-workload datasets with proper properties.
Tuning — runtime knobs for ARC, prefetch, write throttle.
Operations — scrubs, replace, expand, monitor.

Pool Creation¶

Prerequisites¶

The actual zpool create for this build¶

Pool-level options (-o)¶

ashift=12 — locked at create time¶

autotrim=on¶

autoreplace=on (optional)¶

autoexpand=on (optional)¶

cachefile=/etc/zfs/zpool.cache (default)¶

Filesystem-level options (-O) — inherited by every dataset¶

compression=lz4¶

atime=off¶

xattr=sa¶

acltype=posixacl¶

dnodesize=auto¶

mountpoint=/mnt/tank¶