Arch Reproducible Progress July 2021

Posted on Sun 01 August 2021 in arch linux

At the end of July, I had some days off and some more time to focus on some unreproducible packages in Arch Linux and get some of the issues resolved. This post goes through the resolved issues by category.

gzipped man pages

By default if a manpage is compressed with gzip the timestamp is embedded, which makes rebuilding the package unreproducible. By passing -n flag to gzip the gzip archive does not record the timestamp, preferably we should not have to change the build system of upstream for every package. Arch Linux uses makepkg for building packages which already compresses man pages with the -9 -n -f flags for gzip. So it packages like hyperfine can be fixed by making makepkg take care of it, profile-cleaner no longer gzip's man pages and qiv allowed the packager to set the compression program for man pages.

These fixes where a small subset of the tons of packages left which do compress man pages with a recorded timestamp which needs fixing.

Embedded build date

Various projects embed the build date in their binaries, which makes it unreproducible, for skaffold and percona-toolkit a PR has been opened upstream to support SOURCE_DATE_EPOCH. For aqbanking the PKGBUILD now passes a date which if SOURCE_DATE_EPOCH is set is reproducible.

Embedded hostname

The package inn records the hostname which is different per host, the PKGBUILD now sets the hostname to archlinux for every build.

Miscellaneous issues

adapta-gtk-theme was unreproducible as the old packages was build with a pacman version which incorrectly recorded the size of the package and required a simple rebuild to become reproducible.

ctemplate was another package which could be fixed by a rebuild, the package was appending the Perl path which was an issue in Arch Linux as the profile.d script never checked if the path was already in PATH and kept appending it forever.

deepin-clone has the following unreproducible file, which from a short glance looks like a simple ordering issue.

│ ├── usr/share/polkit-1/actions/com.deepin.pkexec.deepin-clone.policy
│ │┄ Ordering differences only
│ │         <annotate key="org.freedesktop.policykit.exec.allow_gui">true</annotate>
│ │ -       <description xml:lang="zh_TW">Deepin 時光機需要在區塊裝置做些動作,例如寫、讀、取得資訊等等。</description>
│ │ -       <message xml:lang="zh_TW">執行 Deepin 時光機需要身份驗證</message>
│ │ -       <description xml:lang="zh_CN">深度备份还原工具需要对块设备进行读写和获取信息等操作</description>
│ │ -       <message xml:lang="zh_CN">使用深度备份还原工具需要认证</message>
│ │ -       <description xml:lang="uk">Deepin Clone потрібно виконувати операції на блочному пристрої, наприклад, писати і читати, отримувати інформацію тощо.</description>
│ │ -       <message xml:lang="uk">Для запуску Deepin Clone потрібна аутентифікація</message>
│ │ -       <description xml:lang="tr">Deepin Klon, blok aygıtı üzerinde yazma ve okuma, bilgi alma ve benzeri işlemleri yapmak zorundadır.</description>
│ │ -       <message xml:lang="tr">Deepin Klon uygulamasını çalıştırmak için kimlik doğrulaması gerekli</message>
│ │ -       <description xml:lang="sr">Дипин Клонирање извршава операције на блок уређајима, као што су уписивање и читање, прикупљање података итд.</description>
│ │ -       <message xml:lang="sr">Аутентификација је неопходна за покретање Дипин Клонирања</message>
│ │ -       <description xml:lang="sl">Klonirnik Deepin mora na napravini datoteki izvajati nekatere operacije, kot so zapisovanje, branje, pridobivanje informacij itd.</description>
│ │ +       <description xml:lang="zh_TW">Deepin 時光機需要在區塊裝置做些動作,例如寫、讀、取得資訊等等。</description>
│ │ +       <message xml:lang="zh_TW">執行 Deepin 時光機需要身份驗證</message>

After looking at the CMakeLists.txt build recipe file, the policy file is generated using deepin-policy-ts-convert, the script reads translations files, records it in a dictionary and then generates a policy file. To reliably test a potential fix, I made a reproducer script:

#!/bin/sh

~/projects/deepin-gettext-tools/src/policy_ts_convert.py ts2policy com.deepin.pkexec.deepin-clone.policy.tmp ./translations test
~/projects/deepin-gettext-tools/src/policy_ts_convert.py ts2policy com.deepin.pkexec.deepin-clone.policy.tmp ./translations test2

diffoscope test test2

However this does not reproduce the issue, most likely due to running the script on the same filesystem and the rebuilders uses ext4 and the buildsystem most likely btrfs. Luckily the reproducible builds project has a tool to create non-deterministic directories using disorderfs.

disorderfs --sort-dirents=yes --reverse-dirents=no  translations translationssorted
disorderfs --sort-dirents=yes --reverse-dirents=yes translations translationsdisorderd

After adjusting the reproducer script we now get a diff from diffoscope. The deepin-policy-ts-convert script reads the translations files using glob which is not deterministic as it relies on readdir and then creates a Python dictionary which it iterates over. Iterating over the dictionary is deterministic but it is not always filled in the same order, so adding a simple sorted on tr_dict before iterating over it makes this process the output xml file reproducible.

In total eleven packages where fixed and Arch Linux still has 1654 unreproducible packages left in it's repositories.