While discussing uv tests with Fedora developers, it occurred to me how different your average Gentoo testing environment is — not only from these used upstream, but also from these used by other Linux distributions. This article will be dedicated exactly to that: to pointing out how it’s different, what does that imply and why I think it’s not a bad thing.
Gentoo as a source-first distro
The first important thing about Gentoo is that it is a source-first distribution. The best way to explain this is to compare it with your average “binary” distribution.
In a “binary” distribution, source and binary packages are somewhat isolated from one another. Developers work with source packages (recipes, specs) and use them to build binary packages — either directly, or via an automation. Then the binary packages hit repositories. The end users usually do not interface with sources at all — may well not even be aware that such a thing exists.
In Gentoo, on the other hand, source packages are the first degree citizens. All users use source repositories, and can optionally use local or remote binary package repositories. I think the best way of thinking about binary packages is: as a form of “cache”.
If the package manager is configured to use binary packages, it attempts to find a package that matches the build parameters — the package version, USE flags, dependencies. If it finds a match, it can use it. If it doesn’t, it just proceeds with building from source. If configured to do so, it may write a binary package as a side effect of that — almost literally cache it. It can also be set to create a binary package without installing it (pre-fill the “cache”). It should hardly surprise anyone at this point that the default local binary packages repository is under the /var/cache tree.
A side implication of this is that the binary packages provided by Gentoo are a subset of all packages available — and on top of that, only a small number of viable package configurations are covered by the official packages.
The build phases
The source build in Gentoo is split into a few phases. The central phases that are of interest here are largely inspired by how autotools-based packages were built. These are:
- src_configure — meant to pass input parameters to the build system, and get it to perform necessary platform checks. Usually involves invoking a configure script, or an equivalent action of a build system such as CMake, Meson or another.
- src_compile — meant to execute the bulk of compilation, and leave the artifacts in the build tree. Usually involves invoking a builder such as make or ninja.
- src_test — meant to run the test suite, if the user wishes testing to be done. Usually involves invoking the check or test target.
- src_install — meant to install the artifacts and other files from the work directory into a staging directory (not the live system). The files can be afterwards transferred to the live system and/or packed into a binary package. Usually involves invoking the install target.
Clearly, it’s very similar to how you’d compile and install software yourself: configure, build, optionally test before installing, and then install.
Of course, this process is not really one-size-fits-all. For example, the modern Python packages no longer even try fitting into it. Instead, we build the wheel in the PEP 517 blackbox manner, and install it to a temporary directory straight in the compile phase. As a result, the test phase is run with a locally-installed package (relying on the logic from virtual environments), and the install phase merely moves files around for the package manager to pick them up.
The implications for testing
The key takeaways of the process are these:
- The test phase is run inside the working tree, against package that was just built but not installed into the live system.
- All the package’s build-time dependencies should be installed into the live system.
- However, the system may contain any other packages, including packages that could affect the just-built package or its test suite in unpredictable ways.
- As a corollary, the live system may or may not contain a copy of the package in question already installed. And if it does, it may be a different version, and/or a different build configuration.
All of these mean trouble. Sometimes random packages will cause the tests to fail as false positives — and sometimes they make also them wrongly pass or get ignored. Sometimes packages already installed will prevent developers from seeing that they’ve missed some dependency. Often mismatches between installed packages will make reproducing issues hard. On top of that, sometimes an earlier installed copy of the package will leak into the test environment, causing confusing problems.
If there are so many negatives, why do we do it then? Because there is also a very important positive: the packages are being tested as close to the production environment as possible (short of actually installing them — but we want to test before that happens). Presence of a certain package may cause tests to fail as false positive — but it may also uncover an actual runtime issue, one that would not otherwise be caught until it actually broke production. And I’m not talking theoretical here. While I don’t have any links handy right now, over and over again we were hitting real issues — either these that haven’t been caught by upstream CI setups yet, or that simply couldn’t have been caught in an idealized test environment.
So yeah, testing stuff this way may be quite a pain, and a source of huge frustration with the constant stream of false positives. But it’s also an important strength that no idealized — not to say “lazy” — test environment can bring. Add to that the fact that a fair number of Gentoo users are actually installing their packages with tests enabled, and you get testing on a huge variety of systems, with different architectures, dependency versions and USE flags, configuration files… and on top of that, a knack for hacking. Yeah, people hate us for finding all these bugs they’d rather not hear about.
If you have a sufficiently large “cache” for packages here, can’t you make the testing happen in a virtual environment where, rather than using installed packages, you copy-from-cache all the dependencies into the virtual environment, run the tests, then drop the virtual environment?
That would uncover all missing dependencies with no false positives, at the cost of needing to cache more or less everything. Likely no fun for a personal user, but for an automated binary package builder that **already** has as the express purpose to build and cache packages, it might make more sense.
Not saying I know what’s actually going on here, just a thought I had while passing by.
Thank you all for the work done on gentoo, I truly appreciate it.
We have people doing exactly that and that’s useful. However, as I’ve explained, this doesn’t reveal issues resulting from additional packages being installed, which is a scenario rarely tested elsewhere and closest to what the users will experience in production.