Arch Linux Reproducible Builds Progress 2020

Posted on Tue 05 January 2021 in Arch Linux, Reproducible Builds

A lot has happened since the last reproducible builds summit in Marrakesh 2019, this blog post is a summary of the progress made in 2020 of everything related to getting reproducible builds in Arch Linux.

archlinux-repro

Also known as repro this tool allows one to rebuild a package and check if it is reproducible by providing a build package such as $foo.pkg.tar.zst. It then set's up a build root, downloads PKGBUILD and sources and rebuilds the package checking if it's reproducible afterwards. During the year the tool has improved a lot to being able to rebuild all the packages in our repository without any known side effects at the moment. Notable changes are:

  • Adding an option to skip running tests, as they are not required to determine reproducibility.
  • Support a shared cache directory. multiple repro instances can share the same cache of packages.
  • Keyring related fixes.

Rebuilderd

During the Reproducible Builds Summit in Marrakesh, kpcyrd started working on rebuilderd. A new tool to rebuild and verify repository packages (distribution agnostic), the rebuilderd daemon syncs repository state from a distribution mirror and queue's packages to be rebuild. Rebuilderd-worker's query the rebuilderd daemon for packages to build and report back to rebuilderd if the build was successful.

rebuilderd had it's first release on 13 April 2020 and during the year has seen 13 releases with some notable improvements:

  • Storing diffoscope and log output.
  • Re-scheduling failed builds with a delay.
  • API enhancements for example a new endpoint /api/v0/dashboard for getting instance statistics.
  • Automatic aborting builds after 24 hours.

Infrastructure

A week after the first release of rebuilderd and packaging in the Arch Linux [community] repository, a rebuilderd instance was setup on 23 April. Soon after a simple React website was created which shows an overview of the reproducibility statistics and per package status, the source can be found in the rebuilderd-website repository.

The setup has matured during the year with the creation of ansible roles for rebuilderd and rebuilderd-worker, a prometheus exporter to monitor the queue length, statuses and online workers, and a Grafana dashboard. Kape has sponsored Arch Linux dedicated servers, one of the servers (AMD EPYC 7702P) now hosts multiple rebuilderd-workers for rebuilding our packages and three other servers as Archive mirrors in America, Asia and Europe.

The TU/Developer dashboard provided by archweb displays unreproducible packages for the logged in packager and can now opt-in warn packagers if a package has become 'unreproducible' via an email notification.

Packaging

To get more reproducible packages, packages have to be patched/adjusted to become reproducible. Various packages have been fixed in our repository, some notable fixes are:

  • A Perl PATH being appended multiple times during build, made Haskell packages unreproducible. This was fixed after a filesystem update and a change in the Perl package.
  • Various Python packages has now set PYTHONHASHSEED to allow Python pyc files to generate reproducible.
  • Removing .doctrees directories from packages which are not required as they are build cache files generated by sphinx and unreproducible.
  • Removal of using go get in packages as it often is unreproducible as dependencies are not pinned.