Multidev bcachefs on NixOS

2024-05-06

Compared to building a standard PC platform, setting up a network-attached server with NixOS doesn't require additional effort.

I've transitioned from TrueNAS to NixOS over the past few months, and it appears to be working well.

Hardware Specifications

Motherboard: ASUS P9D-MX (C222)
CPU: Intel E3 1230-v3
DRAM: 16GB ECC
HDD: HGST 3TB x 4
SSD: nvme-INTEL_MEMPEK1J016GAH_PHBT82920C53016N & SanDisk drive

Core Configuration

Given the need for tiered storage, and the fact that zfs (previously used in TrueNAS) is overly complex and difficult to maintain, I decided to experiment with bcachefs. Coincidentally, bcachefs was integrated into the Linux kernel in version 6.7, which was released around the same time.

I'm using colmena for deployment, and I've created a colmena machine config:

{
lib,
inputs,
user,
...
}:
{
deployment = {
targetHost = "10.0.1.6";
targetPort = 22;
targetUser = user;
privilegeEscalationCommand = [
"doas"
"--"
];
};
imports = lib.sharedModules ++ [
./hardware.nix
./network.nix
./rekey.nix
./spec.nix
../../srv # not nas-specific config, could be emitted
../../age.nix
../../packages.nix
../../misc.nix
../../users.nix
inputs.disko.nixosModules.default
];
}

part of this config relying on entire structure.

hardware config

I'm using disko for disk partition management. However, due to an ongoing issue where disko does not handle bcachefs subvolumes effectively, we only define the SSD where the system is installed.

{
disko = {
devices = {
disk.main = {
device = "/dev/disk/by-id/ata-SanDisk_SD8SBAT032G_153873411000";
type = "disk";
content = {
type = "gpt";
partitions = {
boot = {
size = "1M";
type = "EF02"; # for grub MBR
priority = 0;
};
ESP = {
name = "ESP";
size = "512M";
type = "EF00";
priority = 1;
content = {
type = "filesystem";
format = "vfat";
mountpoint = "/efi";
mountOptions = [
"fmask=0077"
"dmask=0077"
];
};
};
root = {
size = "100%";
content = {
type = "btrfs";
extraArgs = [
"-f"
"--csum xxhash64"
];
subvolumes = {
"root" = {
mountpoint = "/";
mountOptions = [
"compress-force=zstd:1"
"noatime"
"discard=async"
"space_cache=v2"
"nosuid"
"nodev"
];
};
"home" = {
mountOptions = [
"compress-force=zstd:1"
"noatime"
"discard=async"
"space_cache=v2"
"nosuid"
"nodev"
];
mountpoint = "/home";
};
"nix" = {
mountOptions = [
"compress-force=zstd:1"
"noatime"
"discard=async"
"space_cache=v2"
"nosuid"
"nodev"
];
mountpoint = "/nix";
};
"var" = {
mountOptions = [
"compress-force=zstd:1"
"noatime"
"discard=async"
"space_cache=v2"
"nosuid"
"nodev"
];
mountpoint = "/var";
};
};
};
};
};
};
};
};
};
}

enable the kernel config for bcachefs:

{
boot.supportedFilesystems = [ "bcachefs" ];
}

while I'm using a SAS Card for connecting multiply HDD, appending the corresponding initrd kernel module config:

{
boot.initrd = {
compressor = "zstd";
compressorArgs = [
"-19"
"-T0"
];
systemd.enable = true;
availableKernelModules = [
"nvme"
"xhci_pci"
"ahci"
"usb_storage"
"usbhid"
"sd_mod"
"mpt3sas"
];
kernelModules = [
"tpm"
"tpm_tis"
"tpm_crb"
"mpt3sas" # IMPORTANT
];
};
kernelPackages = pkgs.linuxPackages_latest;
}

The configuration mentioned above should allow the machine to boot and recognize devices at startup. Now, we move on to the step of creating a bcachefs filesystem from multiple devices.

We'll take a relaxed approach with the tiered feature:

bcachefs format \
--label=ssd.ssd1 /dev/sdA \
--label=hdd.hdd1 /dev/sdC \
--label=hdd.hdd2 /dev/sdD \
--label=hdd.hdd3 /dev/sdE \
--label=hdd.hdd4 /dev/sdF \
--replicas=2 \
--foreground_target=ssd \
--promote_target=ssd \
--background_target=hdd

Command from Arch wiki

This will waste more space (avaliable 6T) than btrfs raid5 (9T) but far more safe maybe.

Sharp Edge

Systemd does not support bcachefs multidev mount syntax yet (05,2024), to implement multidevice automatic mounting at system startup we need a hand-written systemd service:

{
systemd.services.mount-three = {
description = "mount pool 3";
script =
let
diskId = map (n: "/dev/disk/by-id/" + n) [
"nvme-INTEL_MEMPEK1J016GAH_PHBT82920C53016N"
"wwn-0x5000cca05838bc98"
"wwn-0x5000cca0583a5e34"
"wwn-0x5000cca04608e534"
"wwn-0x5000cca0583880c4"
];
in
# chain call
toString (
lib.getExe (
pkgs.nuenv.writeScriptBin {
name = "mount";
script =
let
mount = "/run/current-system/sw/bin/mount --onlyonce -o noatime,nodev,nosuid -t bcachefs ${lib.concatStringsSep ":" diskId} /three";
in
''
do { ${mount} } | complete
'';
}
)
);
wantedBy = [ "multi-user.target" ];
};
}

And for storage services required the fs mounted first, add extra unit config, e.g.:

systemd.services.minio.unitConfig.RequiresMountsFor = "LABEL=THREE";

finally simply enable some storage service:

{
services.minio = {
enable = true;
region = "ap-east-1";
rootCredentialsFile = config.age.secrets.minio.path;
dataDir = [ "/three/bucket/data" ];
};
# ...etc
}

All set.

©2018-2024 Secirian | CC BY-SA 4.0