Reproducible Arch images with mkosi

Posted on za 31 augustus 2024 in systemd

In the previous article I investigated how to create a reproducible image but ended up with only managing to create two identical image directories. In this article we'll end up with a fully bit-by-bit reproducible filesystem image!

Some things have changed since the last post, mkosi now no longer creates a random-seed which was unreproducible and aux-cache is now removed by default from the initrd. With those changes lets focus on making a reproducible filesystem, the idea was to create a btrfs image so lets try to make one reproducible:

export SOURCE_DATE_EPOCH=0
fallocate -l 500M test1.img
fallocate -l 500M test2.img

mkfs.btrfs -U 588114f7-e142-40a1-8b99-30db4519183e test1.img
mkfs.btrfs -U 588114f7-e142-40a1-8b99-30db4519183e test2.img

md5sum test1.img test2.img
f4a8f407d97d56c4baf8fef3fa762bf3  test1.img
d6fb6681b3b66f88fa16cd9894072e4a  test2.img

So that's not great, doing some research seems to conclude that mkfs.btrfs, can't easily be made reproducible. I've hacked up support for SOURCE_DATE_EPOCH but even after that I'm left with ~ 3000 lines of binary diff.

The default filesystem for mkosi for an Arch image is ext4, so maybe we can make that reproducible instead? Turns out that ext4 supports SOURCE_DATE_EPOCH so we can execute similar steps to create an ext4 image:

export SOURCE_DATE_EPOCH=0
fallocate -l 500M test1.img
fallocate -l 500M test2.img

mkfs.ext4 -U 588114f7-e142-40a1-8b99-30db4519183e test1.img
mkfs.ext4 -U 588114f7-e142-40a1-8b99-30db4519183e test2.img

md5sum test1.img test2.img
9eb56e5c4286b83fe44f504ec457a71a  test1.img
d7d7faf681300c2d13e3921d53a088a0  test2.img

Still not reproducible, luckily ext4 has a tool to dump filesystem information dump2efs.

dump.f2fs test1.img > dump1
dump.f2fs test2.img > dump2

Then diffoscope the results:

-Directory Hash Seed:      00f9d12b-b9f8-4eea-a632-ba4e05dd4b43
+Directory Hash Seed:      36d61985-866a-4e1e-ade7-66e013e1b6a1
 Journal backup:           inode blocks
 Checksum type:            crc32c
-Checksum:                 0xb5aa1d24
+Checksum:                 0x36da2897

That sounds promising, turns out with -E hash_seed=a24031c1-fc68-453d-80fa-00ad057a5780 we have a reproducible filesystem! As of writing it unclear to me if setting this value the same for multiple images has an negative effect.

With this information let's build a mkosi image:

mkosi --debug -d arch -p systemd --seed 0e9a6fe0-68f6-408c-bbeb-136054d20445 --source-date-epoch 1662046009 -m https://archive.archlinux.org/repos/2024/06/30/ --force -o foo --env-file env  --remove-files var/cache/ldconfig/aux-cache --environment=SYSTEMD_LOG_LEVEL=debug

With as env file env

SYSTEMD_REPART_MKFS_OPTIONS_EXT4=-E hash_seed=a24031c1-fc68-453d-80fa-00ad057a5780 -U 6de632fe-7638-44c4-917c-ecf4170af3b4

This was fully reproducible, but adding -p linux made it unreproducible again. By adding the kernel as installed package mkosi sets up an EFI partition and installs a bootloader. So our next challenge is getting the FAT partition reproducible.

I created two fat filesystems with the same steps as I had done for btrfs and ext4 which turned out unreproducible. Luckily dosfstools master already contains a commit which makes mkfs.vfat respect SOURCE_DATE_EPOCH. After patching the Arch package, the build image was still not reproducible. Interestingly the EFI and loader directory in the FAT partition had recent timestamps instead of SOURCE_DATE_EPOCH's value. (losetup --find --show -P $image.raw is useful to readonly mount the EFI partition without having to dd out the EFI partition)

mkosi creates partitions and filesystems with systemd-repart, so I've had to read up how it creates a FAT partition. Back in the day creating a filesystem image required you to mkfs.$fs a filesystem image and mount the image to put content on it. Mounting requires root privileges and more importantly changes the disk metadata on mount (last access time etc.) For filesystems which support it systemd-repart uses a --rootdir (btrfs) option or a similar option to create a filesystem and initialise data on it in one go. For FAT this doesn't exist so instead systemd-repart uses mkfs.vfat and then copies the data over with mcopy (from mtools).

systemd-repart works with ini file with the filesystem definitions for creation the default for the esp partition is.

[Partition]
Type=esp
Format=vfat
CopyFiles=/boot:/
CopyFiles=/efi:/
SizeMinBytes={"1G" if bios else "512M"}
SizeMaxBytes={"1G" if bios else "512M"}

Modifying the default ini file by commenting out CopyFiles=/efi:/ made the image reproducible! So my first theory was that copying two directories with mcopy somehow was unreproducible. After a week of on and off debugging it turns out, this wasn't the issue at all. Replacing mcopy with a simple shell script uncovered that the passed directories (boot/efi) timestamps are recent and not SOURCE_DATE_EPOCH=0!

After uncovering this issue, Daan quickly wrote a pull request to keep directory timestamps intact. After building a patched systemd-repart we have a reproducible image! I'm super excited to see this all work out, and will be investigating if there are more filesystems which can be made reproducible such as f2fs, xfs etc.

If you want to see a full overview of the mkosi configuration used, see this Github repository.