Investigating creating reproducible images with mkosi
Posted on zo 18 augustus 2024 in systemd
I've blogged before about creating vagrant images using mkosi as part of an investigation to move image creation to mkosi but also as I will be giving a talk at All Systems Go about Arch Linux images mkosi and reproducibility.
With reproducible images in this article I mean that anyone would be able to re-recreate the official Arch cloud image bit-by-bit identical on their own machine as per reproducible builds definition.
Arch Linux packages are already 90% reproducible currently, for our image artificats we haven't investigated making them reproducible as of yet.
Edgelesssys and Foxboron already investigated and did some work on creating reproducible mkosi images. Mkosi also already supports some essential parts to make an image reproducible:
- SourceDateEpoch configuration / cli setting - clamps all files to the given timestamp
- Mirror configuration / cli setting - A way to pin the repository to a fixed version (Arch Linux provides archived fixed in time repositories)
- Seed configuration / cli setting - Overrides the seed that systemd-repart uses when building an image (Allows creating reproducible UUID's)
So with that knowledge, let's build two minimal images:
mkosi --distribution arch --package systemd -o bar \
--mirror https://archive.archlinux.org/repos/2024/06/30/ \
--source-date-epoch 1662046009 \
--seed 0e9a6fe0-68f6-408c-bbeb-136054d20445
mkosi --distribution arch --package systemd -o foo \
--mirror https://archive.archlinux.org/repos/2024/06/30/ \
--source-date-epoch 1662046009 \
--seed 0e9a6fe0-68f6-408c-bbeb-136054d20445
This creates an ext4 image with just systemd
, but its not reproducible which we can find out by hashing the created images:
e3e9c0b1b4a91d39da96e4378eb423c9 mkosi.output/foo.raw
6b257b2db8bdf29bf67aa75f534b86b6 mkosi.output/bar.raw
Running diffoscope (a diff on steroids) on this takes a long time and OOM'd the process, let's simplify our approach by first outputting the contents to a directory instead of an image, so we end up running:
mkosi --distribution arch --package systemd -o foo \
--mirror https://archive.archlinux.org/repos/2024/06/30/ \
--source-date-epoch 1662046009 \
--seed 0e9a6fe0-68f6-408c-bbeb-136054d20445 \
--format directory
After re-running this we can now run diffoscope
on the created directories mkosi.output/{foo,bar}
:
sudo diffoscope --html-dir output mkosi.output/foo mkosi.output/bar
The files in the directory are owned by root so sudo
is required to read and diff them.
When diffoscope is done we can view the results in a browser using xdg-open output/index.html
. Two things show up as unreproducible:
var/lib/pacman/local/acl-2.3.2-1/desc
20 %INSTALLDATE% 20 %INSTALLDATE%
21 1723974712 21 1723974743
This is the pacman
local database, where Arch Linux's package manager saves the installed packages state. Simply looking at the C code it is easy to spot the reproducibility issue:
lib/libalpm/add.c
/* make an install date (in UTC) */
newpkg->installdate = time(NULL);
A new timestamp is created, when the package is installed. Making pacman
respect SOURCE_DATE_EPOCH
when installing was rather easily done in this merge request.
The other issue is var/cache/ldconfig/aux-cache
, a binary file which differs. This comes from glibc
and is a cache used by the run-time linker. We can simply leave this out of our build image.
So with a custom patched pacman
and hacking up mkosi
to remove var/cache/ldconfig/aux-cache
we can continue our journey but now with a more realistic scenario.
mkosi --distribution arch -o foo \
--package systemd \
--package grub \
--package base \
--package openssh \
--package sudo \
--package reflector \
--package btrfs-progs \
--package udev \
--mirror https://archive.archlinux.org/repos/2024/06/30/ \
--source-date-epoch 1662046009 \
--seed 0e9a6fe0-68f6-408c-bbeb-136054d20445 \
--format directory
This creates a directory with the packages which will also be on the cloud
image, running diffoscope
now takes a bit longer as the directory is 1G and
diffoscope
analyses with a single process.
The only leftover difference is efi/loader/random-seed
, which is a random
seed file from systemd-boot
. The file is documented on
freedesktop.org
and on systemd.io.
There does not seem to be an option as of now to support creating a
deterministic seed, mkosi supports hooking into the build process at any point
of time so this file could be overwritten after bootctl install
to be the
same for all images.
One leftover aspect of creating a reproducible image is the filesystem, raw image and partition table. I hope to tackle this in a future post.