ZFS Encryption and Organization
To learn Unix system administration and to facilitate backups, I have been maintaining a home server. I chose to avoid options like TrueNAS that put the user at a level of abstraction too high to understand the system’s individual components like ZFS. Everything becomes a checkbox on a webpage. This is a great feature for ease of use, but it would restrict my learning.
I gathered three one terabyte hard drives (one 2.5" drive from an old laptop, one 3.5" from an old desktop, and one from craigslist) and one 128 gigabyte solid state drive (SSD). While I cannot recommend that you reproduce this setup, it does offer good data resiliency. The drives are unlikely to fail at the same time, which may be a problem with drives bought in a batch. In a raidz1 configuration, the data will survive one drive failure.
Overview of previous config
I installed Debian Buster onto the SSD. The base install had only an ext4 filesystem. I had not initially considered that booting Linux on ZFS is not well-supported, but the server’s layout was made more redundant because of this. The server can boot without any of the ZFS drives, and the ZFS drives could be mounted on any other machine with the correct filesystem packages. In this way, the ZFS storage is agnostic to the operating system, and only has to concern itself with my media and backups.
I had an incomplete understanding of ZFS when I first deployed it on the server, and the implementation was sloppy. For one, I was so concerned with getting everything to work that I overlooked encryption until after the storage was in use, complicating migration to an encrypted setup.
Organizational problems arose from the misconception that my group of drives
could only be mounted in one place. This is not at all the case. I began by
using only the default mountpoint, which is just a directory at the root of
your filesystem. For me, this was /mainpool
. Of course, this caused a mess
with permissions. By default, all files within the directory were owned by
root, meaning that the system user for Jellyfin could
not read my media.
How ZFS actually works
Now I will explain how ZFS works in my environment so that you will know enough
to avoid the mistakes explained above. On Linux, block devices are referred to
as files like /dev/sda
or /dev/disk/by-uuid/...
These disks (or even just
one) can be grouped into a “virtual device” or “vdev” by ZFS. Parity can be set
at the vdev level. This can be RAID-Z or a mirror. A mirror copies the same
data to every drive in the vdev, whereas RAID-Z creates a parity of the
specific number of disks. For instance, raidz1 is one drive of parity. One or
more virtual devices can be added into a zpool. Data will be distributed across
all of a pool’s vdevs. Bigger vdevs will receive a larger proportion of this
data, so it can be more efficient to have vdevs of similar sizes.
I have transcribed an example config that captures the properties covered so
far. Here, the zpool is example
, and the vdevs are raidz1-0
and mirror-1
:
$ zpool status example
pool: example
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
example ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada5 ONLINE 0 0 0
errors: No known data errors
Once the zpool has been created, the zpool can be mounted in multiple locations via filesystem datasets. You can create as many of these datasets as you like, and they can be mounted just like other filesystems. You can have a dataset for every media folder, and every user’s home directory. There is no cost for having too many. Encryption is usually added per dataset.
Better Implementation
In order to reconfigure ZFS without losing data, I copied everything to an
external hard drive, then destroyed the filesystem using zfs destroy
.
Starting from scratch, I ran:
$ zpool create -O mountpoint=none tank raidz1 sdb sdc sdd
Where sdb
sdc
and sdd
are the hard drives.
Here is what the config looks like:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
Very nice. Next I will create an encrypted dataset mounted at my Jellyfin media directory.
$ zfs create -o encryption=aes-256-gcm -o keyformat=passphrase -o keylocation=prompt -o mountpoint=/var/lib/jellyfin/media tank/jelly
Now I can begin copying the media back over from the external hard drive.
In order to mount the pool in the future, the key must be loaded beforehand. This is achieved using zfs load-key
as follows:
$ zfs load-key tank/jelly
$ zfs mount tank/jelly
Key Takeaways
In my previous configuration, reliance on the default mountpoint at the root of the Linux filesystem led to poor organization. It is better to create and mount several datasets for your specific needs, and to use options that fit each need, like encryption. Additionally, be aware of the immutability of settings like encryption. Encryption cannot be easily enabled on existing, unencrypted datasets. So, make sure that datasets are configured correctly before extensive use.