Investigating creating reproducible images with mkosi

Posted on zo 18 augustus 2024 in systemd

I've blogged before about creating vagrant images using mkosi as part of an investigation to move image creation to mkosi but also as I will be giving a talk at All Systems Go about Arch Linux images mkosi and reproducibility.

With reproducible images in this article I mean that anyone would be able to re-recreate the official Arch cloud image bit-by-bit identical on their own machine as per reproducible builds definition.

Arch Linux packages are already 90% reproducible currently, for our image artificats we haven't investigated making them reproducible as of yet.

Edgelesssys and Foxboron already investigated and did some work on creating reproducible mkosi images. Mkosi also already supports some essential parts to make an image reproducible:

  • SourceDateEpoch configuration / cli setting - clamps all files to the given timestamp
  • Mirror configuration / cli setting - A way to pin the repository to a fixed version (Arch Linux provides archived fixed in time repositories)
  • Seed configuration / cli setting - Overrides the seed that systemd-repart uses when building an image (Allows creating reproducible UUID's)

So with that knowledge, let's build two minimal images:

mkosi --distribution arch --package systemd -o bar \
      --mirror https://archive.archlinux.org/repos/2024/06/30/ \
      --source-date-epoch 1662046009 \
      --seed 0e9a6fe0-68f6-408c-bbeb-136054d20445

mkosi --distribution arch --package systemd -o foo \
      --mirror https://archive.archlinux.org/repos/2024/06/30/ \
      --source-date-epoch 1662046009 \
      --seed 0e9a6fe0-68f6-408c-bbeb-136054d20445

This creates an ext4 image with just systemd, but its not reproducible which we can find out by hashing the created images:

e3e9c0b1b4a91d39da96e4378eb423c9  mkosi.output/foo.raw
6b257b2db8bdf29bf67aa75f534b86b6  mkosi.output/bar.raw

Running diffoscope (a diff on steroids) on this takes a long time and OOM'd the process, let's simplify our approach by first outputting the contents to a directory instead of an image, so we end up running:

mkosi --distribution arch --package systemd -o foo \
      --mirror https://archive.archlinux.org/repos/2024/06/30/ \
      --source-date-epoch 1662046009 \
      --seed 0e9a6fe0-68f6-408c-bbeb-136054d20445 \
      --format directory

After re-running this we can now run diffoscope on the created directories mkosi.output/{foo,bar}:

sudo diffoscope --html-dir output mkosi.output/foo mkosi.output/bar

The files in the directory are owned by root so sudo is required to read and diff them.

When diffoscope is done we can view the results in a browser using xdg-open output/index.html. Two things show up as unreproducible:

var/lib/pacman/local/acl-2.3.2-1/desc

20  %INSTALLDATE%   20  %INSTALLDATE%
21  1723974712  21  1723974743

This is the pacman local database, where Arch Linux's package manager saves the installed packages state. Simply looking at the C code it is easy to spot the reproducibility issue:

lib/libalpm/add.c

    /* make an install date (in UTC) */
    newpkg->installdate = time(NULL);

A new timestamp is created, when the package is installed. Making pacman respect SOURCE_DATE_EPOCH when installing was rather easily done in this merge request.

The other issue is var/cache/ldconfig/aux-cache, a binary file which differs. This comes from glibc and is a cache used by the run-time linker. We can simply leave this out of our build image.

So with a custom patched pacman and hacking up mkosi to remove var/cache/ldconfig/aux-cache we can continue our journey but now with a more realistic scenario.

mkosi --distribution arch -o foo \
      --package systemd \
      --package grub \
      --package base \
      --package openssh \
      --package sudo \
      --package reflector \
      --package btrfs-progs \
      --package udev \
      --mirror https://archive.archlinux.org/repos/2024/06/30/ \
      --source-date-epoch 1662046009 \
      --seed 0e9a6fe0-68f6-408c-bbeb-136054d20445 \
      --format directory

This creates a directory with the packages which will also be on the cloud image, running diffoscope now takes a bit longer as the directory is 1G and diffoscope analyses with a single process.

The only leftover difference is efi/loader/random-seed, which is a random seed file from systemd-boot. The file is documented on freedesktop.org and on systemd.io.

There does not seem to be an option as of now to support creating a deterministic seed, mkosi supports hooking into the build process at any point of time so this file could be overwritten after bootctl install to be the same for all images.

One leftover aspect of creating a reproducible image is the filesystem, raw image and partition table. I hope to tackle this in a future post.