For those in the know ZFS has been the file system of choice for bit rot resistant large scale storage. But its snapshot capabilities and the the ability to easily send entire file systems between physical storage array also hold some attraction for hypervisors and mission critical data center applications.
What ever your motivation, this guide will show you how to get a pure ZFS system running on a machine that you only have console access to. We are basing this guide on Ubuntu 18.04 Server, but this should also work mostly on 20.04 Server. Note that on 20.04 desktop Zsys, Ubuntu’s clever but in this instance not entirely helpful ZFS management layer will get in the way. So you’ll be on your own there, although why are you running a desktop install on a dedicated server machine anyway?
Usual disclaimer applies. Us this guide at your own risk.
Prerequisites:
- Basic understanding of ZFS concepts.
Software:
- To start with, any type of live Linux booted into RAM on your server. Ubuntu 18.04 or Debian based distros are preferred but you can get debootstrap to work on pretty much any distro.
- root – can’t do this without it.
Hardware:
- Preferably a server with ECC ram. Yes zfs also works without it, but it trusts the ram content unconditionally, so a corrupted d-ram cell could corrupt your stored data if you go without ECC. And since space weather is looking to get rougher might be not a bad investment.
- One or more drives, preferably whole ones. Yes zfs could work on a single partition but giving it a whole drive is better for reasons that are out of scope here. Ideally you’d want more than one drive, because a single drive setup will only allow you to spot bit rot but not recover from it.
- Something fast, durable and low latency as read cache (L2ARC) and something decently fast and durable for SLOG for ZIL. Think of SLOG as bucket your writes to your array get dumped in that is used as a roll back in case your machine is reset before the array has caught up with the writes. The read caching is hopefully self explanatory.
- If you are running multiple VMS on a hypervisor with just 3-4 drives don’t even think about doing this without an L2ARC and SLOG. ZFS can handle petabites of data, but a speed demon on small arrays it ain’t. Incidentally Intel Xpoint storage is great for both L2ARC and SLOG, chiefly because of its low latency.
Alright let’s get into it.
Preparing the drives:
This guide was adapted from the existing openzfs material. You will need to refer to it in part if you are configuring an UEFI boot system.
This part assumes we had to boot into a live rescue system which is based on debian buster. We will enable zfs support in the rescue image then debootstrap Ubuntu Bionic from it onto the zfs pools we create.
This creates a system with separate boot and root pools. The former has fewer features turned on since grub does not support some of the advanced stuff. While not applicable here, on the newer version of ZFS that ships with UB20.04 native zfs encryption is supported, which does upset grub, even when it occurs on the root pool. At least based on the version of grub out around May 2020.
This guide has some customizations regarding disk quotas and check sum algo.
We already have root and ssh is installed (obviously) otherwise take care of that first.
apt-get update
apt install --yes debootstrap gdisk dkms dpkg-dev linux-headers-$(uname -r)
apt install --yes -t buster-backports --no-install-recommends zfs-dkms
modprobe zfs
apt install --yes -t buster-backports zfsutils-linux
Formatting and partitioning disks. Create some variables that will make the rest of the process easier, so you don’t need to refer to massive path names. Be sure to use the ATA or NVME names.
ls /dev/disk/by-id/
DISK1=/dev/disk/by-id/ata-WDC_WD3000FYYZ-01UL1B2_WD-WCC138XFVD4
DISK2= ….
Whenever $DISK is used below assume you may need to refer to $DISK1.. $DISKn
If the disk was previously used in an MD array:
apt install --yes mdadm
If so, stop them (replace “md0“ as required):
mdadm --stop /dev/md0
# For an array using the whole disk:
mdadm --zero-superblock --force $DISK
# For an array using a partition:
mdadm --zero-superblock --force ${DISK}-part2
Clear the partition table:
sgdisk --zap-all $DISK
Run this if you need legacy (BIOS) booting:
sgdisk -a1 -n1:24K:+2000K -t1:EF02 $DISK
Run this for UEFI booting (for use now or in the future) or use and size it for a swap partition if you like (-t2:8200):
sgdisk -n2:2M:+512M -t2:EF00 $DISK
Run this for the boot pool:
sgdisk -n3:0:+1G -t3:BF01 $DISK
For the root pool:
sgdisk -n4:0:0 -t4:BF00 $DISK
We used zfs “reserved” partition types for cache and log devices which are obviously on separate NVME drives.
Now create the root pool. Note ashift is drive dependent. Use smartclt to determine physical and logical sectors of drive. For true 512 bite sector drive use ashift=9 for true 4k drives use ashift=12. Ashift 12 will work on 512 drives as well but you will lose a lot of capacity for certain file types. In my tests up to 25 percent. Also note that lower case o and upper case O matter. The latter are pool options. Use -f flag when disk might still be part of old pools. Or better still delete those pools first by importing then using zpool destroy…
Creating pools:
Now let’s set this up for Ubuntu. First the boot pool:
zpool create -f -o ashift=9 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off \
-O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=/ -R /mnt bpool raidz ${DISK1}-part3 ${DISK2}-part3 ${DISK3}-part3
Now for the root pool:
zpool create -f -o ashift=9 \
-O acltype=posixacl -O canmount=off -O compression=lz4 \
-O dnodesize=auto -O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=/ -R /mnt rpool raidz ${DISK1}-part4 ${DISK2}-part4 ${DISK3}-part4
To add separate ZIL (SLOG) and L2ARC to the root pool do this – point to relevant SSD or NVME device or partition:
# for zil (SLOG):
sudo zpool add -f rpool log /dev/disk/by-id/nvme...
# for L2ARC:
sudo zpool add rpool cache /dev/disk/by-id/nvme...
Create filesystem datasets to act as containers:
zfs create -o checksum=sha256 -o canmount=off -o mountpoint=none rpool/ROOT
zfs create -o checksum=sha256 -o canmount=off -o mountpoint=none bpool/BOOT
Create filesystem datasets for the root and boot filesystems:
zfs create -o quota=20G -o canmount=noauto -o mountpoint=/ rpool/ROOT/ubuntu
zfs mount rpool/ROOT/ubuntu
zfs create -o quota=1.5G -o canmount=noauto -o mountpoint=/boot bpool/BOOT/ubuntu
zfs mount bpool/BOOT/ubuntu
Create datasets (adjust layout to suit your needs):
zfs set checksum=sha256 rpool
zfs create -o quota=20G rpool/home
zfs create -o quota=5G -o mountpoint=/root rpool/home/root
zfs create -o quota=6G -o canmount=off rpool/var
zfs create -o quota=4.5G -o canmount=off rpool/var/lib
zfs create -o quota=5G rpool/var/log
zfs create -o quota=1512M rpool/var/spool
If you wish to exclude these from snapshots:
zfs create -o quota=1G -o com.sun:auto-snapshot=false rpool/var/cache
zfs create -o quota=1G -o com.sun:auto-snapshot=false rpool/var/tmp
chmod 1777 /mnt/var/tmp
For a disk based tmps:
zfs create -o quota=2G -o com.sun:auto-snapshot=false rpool/tmp
chmod 1777 /mnt/tmp
Use zfs get checksum path/to/dataset to check that sha256 was inherited across correctly, same goes for compression. Using sha256 vs sha512 for compatibility reasons.
Install the minimal system:
First make debootstrap work so it can install Ubuntu from the Debian based system we are in.
wget http://at.archive.ubuntu.com/ubuntu/pool/main/u/ubuntu-keyring/ubuntu-keyring_2018.09.18.1~18.04.0.tar.gz
tar xzf ubuntu-keyring_2018.09.18.1~18.04.0.tar.gz
cd ubuntu-keyring-2018.09.18.1/
cp -f keyrings/* /usr/share/keyrings/
debootstrap bionic /mnt
zfs set devices=off rpool
Replace HOSTNAME
with the desired hostname:
echo HOSTNAME > /mnt/etc/hostname
vi /mnt/etc/hosts
# Add a line:
127.0.1.1 HOSTNAME
# or if the system has a real name in DNS:
127.0.1.1 FQDN HOSTNAME
Find the interface name:
ip addr show
Adjust NAME below to match your interface name:
vi /mnt/etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
NAME:
dhcp4: true
Your ISP / data center may require you to hard code your ip, so use whatever method to configure netcfg.yaml that will ensure you can connect to your machine after reboot.
Configure the package sources:
vi /mnt/etc/apt/sources.list
deb http://archive.ubuntu.com/ubuntu bionic main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu bionic-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu bionic-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu bionic-security main restricted universe multiverse
“Enter” the new system:
Bind the virtual filesystems from the rescue system environment to the new system and chroot
into it, adjust the DISK variable handover as appropriate, separate multiple variables by spaces:
mount --rbind /dev /mnt/dev
mount --rbind /proc /mnt/proc
mount --rbind /sys /mnt/sys
chroot /mnt /usr/bin/env DISK=$DISK bash --login
Note: This is using --rbind
, not --bind
Configure a basic system environment:
ln -s /proc/self/mounts /etc/mtab
apt update
dpkg-reconfigure locales
Even if you prefer a non-English system language, always ensure that en_US.UTF-8
is available:
dpkg-reconfigure tzdata
apt install --yes vim openssh-server
Configure sshd_config to ensure it permits root login for now (temporarily).
Install ZFS in the chroot environment for the new system:
apt install --yes --no-install-recommends linux-image-generic
# OR see below:
apt install --yes --no-install-recommends linux-image-generic-hwe-18.04
apt install --yes zfs-initramfs
Hint: For the HWE kernel, install linux-image-generic-hwe-18.04
instead of linux-image-generic
.
We’ll need that for the encrypted swap later:
apt install --yes cryptsetup
Install grub and select all disks that apply. i.e. the full array, anything bootable. This guide refers to bios systems. Refer to the guide in the link at the start of the article to set up UEFI booting systems. You must use either or.
apt install --yes grub-pc
to get rid off annoying error messages…
dpkg --purge os-prober
Set root password for reboot. Confirm that sshd_config permits root login.
passwd
Enable importing bpool:
This ensures that bpool
is always imported, regardless of whether /etc/zfs/zpool.cache
exists, whether it is in the cachefile or not, or whether zfs-import-scan.service
is enabled.
vi /etc/systemd/system/zfs-import-bpool.service
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
[Install]
WantedBy=zfs-import.target
systemctl enable zfs-import-bpool.service
Verify that the ZFS boot filesystem is recognized:
grub-probe /boot
Refresh the initrd files:
update-initramfs -c -k all
Workaround GRUB’s missing zpool-features support:
vi /etc/default/grub
# Set:
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/ubuntu"
#While you are there: Disable memory zeroing:
vi /etc/default/grub
# Add:
init_on_alloc=0 to: GRUB_CMDLINE_LINUX_DEFAULT
# Save and quit.
This is to address performance regressions. Since we are using the 5.3 Kernel due to installing linux-image-generic-hwe-18.04 we would be affected and should compensate for it.
Optional (but highly recommended): Make debugging GRUB easier:
vi /etc/default/grub
# Comment out: GRUB_TIMEOUT_STYLE=hidden
# Set: GRUB_TIMEOUT=5
# Below GRUB_TIMEOUT, add: GRUB_RECORDFAIL_TIMEOUT=5
# Remove quiet and splash from: GRUB_CMDLINE_LINUX_DEFAULT
# Uncomment: GRUB_TERMINAL=console
# Save and quit.
Update the boot configuration:
update-grub
For legacy (BIOS) booting, install GRUB to the MBR:
grub-install $DISK
Fix filesystem mount ordering (once again we are compensating for systemD fuckery):
zfs set mountpoint=legacy bpool/BOOT/ubuntu
echo bpool/BOOT/ubuntu /boot zfs \
nodev,relatime,x-systemd.requires=zfs-import-bpool.service 0 0 >> /etc/fstab
zfs set mountpoint=legacy rpool/var/log
echo rpool/var/log /var/log zfs nodev,relatime 0 0 >> /etc/fstab
zfs set mountpoint=legacy rpool/var/spool
echo rpool/var/spool /var/spool zfs nodev,relatime 0 0 >> /etc/fstab
# If you created a /var/tmp dataset:
zfs set mountpoint=legacy rpool/var/tmp
echo rpool/var/tmp /var/tmp zfs nodev,relatime 0 0 >> /etc/fstab
# If you created a /tmp dataset:
zfs set mountpoint=legacy rpool/tmp
echo rpool/tmp /tmp zfs nodev,relatime 0 0 >> /etc/fstab
Snapshot the initial installation:
zfs snapshot bpool/BOOT/ubuntu@install
zfs snapshot rpool/ROOT/ubuntu@install
Exit from the chroot
environment back to the LiveCD environment:
exit
Run these commands in the LiveCD environment to unmount all filesystems:
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
zpool export -a
exporting the zpool above IS CRUCIAL – if you reboot without it, system won’t come up since otherwise zpool was not unmounted cleanly.
You did install ssh server in the chroot and enabled root login right?
reboot
TUNING after reboot:
apt-get update
apt dist-upgrade --yes
Install a command-line environment only:
apt install --yes ubuntu-standard
Mirror GRUB
If you installed to multiple disks, install GRUB on the additional disks:
For legacy (BIOS) booting: THIS IS ONLY NEEDED if you didn’t install grub on all disks already. Otherwise ignore.
dpkg-reconfigure grub-pc
Hit enter until you get to the device selection screen.
Select (using the space bar) all of the disks (not partitions) in your pool.
Adjust the swap partition reference in the examples below as needed. Don’t put swap onto ZFS because this can lead to system lock up. USE MDadmin if you need to stripe across disks.
MAKE SURE YOU ADJUST THE THE $DISK VARIABLE CORRECTLY.
cat /proc/swaps
should return empty.
For an encrypted single-disk install:
apt install --yes cryptsetup
echo swap ${DISK}-part2 /dev/urandom \
swap,cipher=serpent-xts-plain64:sha256,size=512,noearly >> /etc/crypttab
echo /dev/mapper/swap none swap defaults,nofail,x-systemd.device-timeout=30 0 0 >> /etc/fstab
Options in bold are to deal with infinite boot wait for swap by systemD.
For an encrypted mirror or raidz topology:
apt install --yes cryptsetup mdadm
# Adjust the level (ZFS raidz = MD raid5, raidz2 = raid6) and
# raid-devices if necessary and specify the actual devices.
# Personally, I would strip swap across disks but still avoid using it.
mdadm --create /dev/md0 --metadata=1.2 --level=mirror \
--raid-devices=2 ${DISK1}-part2 ${DISK2}-part2
echo swap /dev/md0 /dev/urandom \
swap,cipher=aes-xts-plain64:sha256,size=512 >> /etc/crypttab
echo /dev/mapper/swap none swap defaults 0 0 >> /etc/fstab
Optional: Disable log compression:
As /var/log
is already compressed by ZFS, logrotate’s compression is going to burn CPU and disk I/O for (in most cases) very little gain. Also, if you are making snapshots of /var/log
, logrotate’s compression will actually waste space, as the uncompressed data will live on in the snapshot. You can edit the files in /etc/logrotate.d
by hand to comment out compress
, or use this loop (copy-and-paste highly recommended):
for file in /etc/logrotate.d/* ; do
if grep -Eq "(^|[^#y])compress" "$file" ; then
sed -i -r "s/(^|[^#y])(compress)/\1#\2/" "$file"
fi
done
Then proceed through your standard OS config routine – at the end of it all remember to take a snapshot of boot and root. 😉 and turn off root login in SSHd config.
That’s it. Good luck.
Like this:
Like Loading...
For those in the know ZFS has been the file system of choice for bit rot resistant large scale storage. But its snapshot capabilities and the the ability to easily send entire file systems between physical storage array also hold some attraction for hypervisors and mission critical data center applications.
What ever your motivation, this guide will show you how to get a pure ZFS system running on a machine that you only have console access to. We are basing this guide on Ubuntu 18.04 Server, but this should also work mostly on 20.04 Server. Note that on 20.04 desktop Zsys, Ubuntu’s clever but in this instance not entirely helpful ZFS management layer will get in the way. So you’ll be on your own there, although why are you running a desktop install on a dedicated server machine anyway?
Usual disclaimer applies. Us this guide at your own risk.
Prerequisites:
Software:
Hardware:
Alright let’s get into it.
Preparing the drives:
This guide was adapted from the existing openzfs material. You will need to refer to it in part if you are configuring an UEFI boot system.
This part assumes we had to boot into a live rescue system which is based on debian buster. We will enable zfs support in the rescue image then debootstrap Ubuntu Bionic from it onto the zfs pools we create.
This creates a system with separate boot and root pools. The former has fewer features turned on since grub does not support some of the advanced stuff. While not applicable here, on the newer version of ZFS that ships with UB20.04 native zfs encryption is supported, which does upset grub, even when it occurs on the root pool. At least based on the version of grub out around May 2020.
This guide has some customizations regarding disk quotas and check sum algo.
We already have root and ssh is installed (obviously) otherwise take care of that first.
Formatting and partitioning disks. Create some variables that will make the rest of the process easier, so you don’t need to refer to massive path names. Be sure to use the ATA or NVME names.
Whenever $DISK is used below assume you may need to refer to $DISK1.. $DISKn
If the disk was previously used in an MD array:
If so, stop them (replace “md0“ as required):
Clear the partition table:
Run this if you need legacy (BIOS) booting:
Run this for UEFI booting (for use now or in the future) or use and size it for a swap partition if you like (-t2:8200):
Run this for the boot pool:
For the root pool:
We used zfs “reserved” partition types for cache and log devices which are obviously on separate NVME drives.
Now create the root pool. Note ashift is drive dependent. Use smartclt to determine physical and logical sectors of drive. For true 512 bite sector drive use ashift=9 for true 4k drives use ashift=12. Ashift 12 will work on 512 drives as well but you will lose a lot of capacity for certain file types. In my tests up to 25 percent. Also note that lower case o and upper case O matter. The latter are pool options. Use -f flag when disk might still be part of old pools. Or better still delete those pools first by importing then using zpool destroy…
Creating pools:
Now let’s set this up for Ubuntu. First the boot pool:
Now for the root pool:
To add separate ZIL (SLOG) and L2ARC to the root pool do this – point to relevant SSD or NVME device or partition:
Create filesystem datasets to act as containers:
Create filesystem datasets for the root and boot filesystems:
Create datasets (adjust layout to suit your needs):
If you wish to exclude these from snapshots:
For a disk based tmps:
Use zfs get checksum path/to/dataset to check that sha256 was inherited across correctly, same goes for compression. Using sha256 vs sha512 for compatibility reasons.
Install the minimal system:
First make debootstrap work so it can install Ubuntu from the Debian based system we are in.
Replace
HOSTNAME
with the desired hostname:Find the interface name:
Adjust NAME below to match your interface name:
Your ISP / data center may require you to hard code your ip, so use whatever method to configure netcfg.yaml that will ensure you can connect to your machine after reboot.
Configure the package sources:
“Enter” the new system:
Bind the virtual filesystems from the rescue system environment to the new system and
chroot
into it, adjust the DISK variable handover as appropriate, separate multiple variables by spaces:Note: This is using
--rbind
, not--bind
Configure a basic system environment:
Even if you prefer a non-English system language, always ensure that
en_US.UTF-8
is available:Configure sshd_config to ensure it permits root login for now (temporarily).
Install ZFS in the chroot environment for the new system:
Hint: For the HWE kernel, install
linux-image-generic-hwe-18.04
instead oflinux-image-generic
.We’ll need that for the encrypted swap later:
Install grub and select all disks that apply. i.e. the full array, anything bootable. This guide refers to bios systems. Refer to the guide in the link at the start of the article to set up UEFI booting systems. You must use either or.
to get rid off annoying error messages…
Set root password for reboot. Confirm that sshd_config permits root login.
Enable importing bpool:
This ensures that
bpool
is always imported, regardless of whether/etc/zfs/zpool.cache
exists, whether it is in the cachefile or not, or whetherzfs-import-scan.service
is enabled.Verify that the ZFS boot filesystem is recognized:
Refresh the initrd files:
Workaround GRUB’s missing zpool-features support:
This is to address performance regressions. Since we are using the 5.3 Kernel due to installing
linux-image-generic-hwe-18.04 we would be affected and should compensate for it.
Optional (but highly recommended): Make debugging GRUB easier:
Update the boot configuration:
For legacy (BIOS) booting, install GRUB to the MBR:
Fix filesystem mount ordering (once again we are compensating for systemD fuckery):
Snapshot the initial installation:
Exit from the
chroot
environment back to the LiveCD environment:Run these commands in the LiveCD environment to unmount all filesystems:
exporting the zpool above IS CRUCIAL – if you reboot without it, system won’t come up since otherwise zpool was not unmounted cleanly.
You did install ssh server in the chroot and enabled root login right?
TUNING after reboot:
Install a command-line environment only:
Mirror GRUB
If you installed to multiple disks, install GRUB on the additional disks:
For legacy (BIOS) booting: THIS IS ONLY NEEDED if you didn’t install grub on all disks already. Otherwise ignore.
Hit enter until you get to the device selection screen.
Select (using the space bar) all of the disks (not partitions) in your pool.
Adjust the swap partition reference in the examples below as needed. Don’t put swap onto ZFS because this can lead to system lock up. USE MDadmin if you need to stripe across disks.
MAKE SURE YOU ADJUST THE THE $DISK VARIABLE CORRECTLY.
should return empty.
For an encrypted single-disk install:
Options in bold are to deal with infinite boot wait for swap by systemD.
For an encrypted mirror or raidz topology:
Optional: Disable log compression:
As
/var/log
is already compressed by ZFS, logrotate’s compression is going to burn CPU and disk I/O for (in most cases) very little gain. Also, if you are making snapshots of/var/log
, logrotate’s compression will actually waste space, as the uncompressed data will live on in the snapshot. You can edit the files in/etc/logrotate.d
by hand to comment outcompress
, or use this loop (copy-and-paste highly recommended):Then proceed through your standard OS config routine – at the end of it all remember to take a snapshot of boot and root. 😉 and turn off root login in SSHd config.
That’s it. Good luck.
Share this:
Like this: