Gentoo: profiles and keywords rather than releases

Different distributions have different approaches to releases. For example, Debian simultaneously maintains multiple releases (branches). The “stable” branch is recommended for production use, “testing” for more recent software versions. Every two years or so, the branches “shift” (i.e. the previous “testing” becomes the new “stable”, and so on) and users are asked to upgrade to the next release.

Fedora releases aren’t really branched like Debian. Instead, they make a new release (with potentially major changes for an upgrade) every half a year, and maintain old releases for 13 months. You generally start with the newest release, and periodically upgrade.

Arch Linux follows a rolling release model instead. There is just one branch that all Arch users use, and releases are made periodically only for the purpose of installation media. Major upgrades are done in-place (and I have to say, they don’t always go well).

Now, Gentoo is something of a hybrid, as it combines the best of both worlds. It is a rolling release distribution with a single shared repository that is available to all users. However, within this repository we use a keywording system to provide a choice between stable and testing packages, to facilitate both production and development systems (with some extra flexibility), and versioned profiles to tackle major lock-step upgrades.

Architectures

Before we enter any details, we need to clarify what an architecture (though I suppose platform might be a better term) is in Gentoo. In Gentoo, architectures provide a coarse (and rather arbitrary) way of classifying different supported processor families.

For example, the amd64 architecture is indicates 64-bit x86 processors (also called x86-64) running 64-bit userland, while x86 indicates 32-bit userland for x86 processors (both 32-bit and 64-bit in capability). Similarly, 64-bit AArch64 (ARMv8) userland is covered by arm64, while the 32-bit userland on all ARM architecture versions is covered by the arm. This is best seen in the ARM stage downloads — a single architecture is split into subarchitectures there.

For some architectures, the split is even coarser. For example, mips and riscv (at least for the moment) cover both 32-bit and 64-bit variations of the architecture. ppc64 covers both big-endian and little-endian (PPC64LE) variations — and the default big-endian variation tends to cause more issues with software.

Why does the split matter? Primarily because architectures define keywords, and keywords indicate whether the package works. A coarser split means that a single keyword may be used to cover a wide variety of platforms — not of which are equally working. But more on that further on.

By the way, I’ve mentioned “platforms” earlier. Why? Because besides the usual architectures, we are using names such as amd64-linux and x64-macos for Prefix — i.e. running Gentoo inside another operating system (or Linux distribution). Historically, we also had a Gentoo/FreeBSD variation.

Profiles

The simplest way of thinking of profiles would be as different Gentoo configurations. Gentoo provides a number of profiles for every supported architecture. Profiles serve multiple purposes.

The most obvious purpose is providing suitable defaults for different, well, profiles of Gentoo usage. So we have base profiles that are better suited for headless systems, and desktop profiles that are optimized for desktop use. Within desktop profiles, we have subprofiles for GNOME and Plasma desktops. We have base profiles for OpenRC, and subprofiles for systemd; base profiles for the GNU toolchain and subprofiles for the LLVM toolchain. Of course, these merely control defaults — you aren’t actually required to use a specific subprofile to use the relevant software; you can adjust your configuration directly instead. However, using a right fit of a profile makes things easier, and increases the chances of finding Gentoo binary packages that match your setup.

But there’s more to profiles than that. Profiles also control non-trivial system configuration aspects that cannot be easily changed. We have separate profiles for systems that have undergone the “/usr merge”, and for systems that haven’t — and you can’t switch between the two without actually migrating your system first. On some architectures we have profiles with and without multilib; this is e.g. necessary to run 32-bit executables on amd64. On ARM, separate profiles are provided for different architecture versions. The implication of all that is that profiles also control which packages can actually be installed on a system. You can’t install 32-bit software on an amd64 non-multilib system, or packages requiring newer ARM instructions on a system using a profile for older processors.

Finally, profiles are versioned to carry out major changes in Gentoo. This is akin to how Debian or Fedora do releases. When we introduce major changes that require some kind of migration, we do that via a new profile version. Users are provided with upgrade instructions, and are asked to migrate their systems. And we do support both old and new profiles for some time. To list two examples:

  1. 17.1 amd64 profiles changed the multilib layout from using lib64 + lib32 (+ a compatibility lib symlink) to lib64 + lib.
  2. 23.0 profiles featured hardening- and optimization-related toolchain changes.

Every available profile has one of three stability levels: stable, dev or exp. As you can guess, “stable” profiles are the ones that are currently considered safe to use on production systems. “Dev” profiles should be good too, but they’re not as well tested yet. Then, “exp” profiles come with no guarantees, not even of dependency graph integrity (to be explained further on).

Keywords

While profiles can control which packages can be installed to some degree, keywords are at the core of that. Keywords are specified for every package version separately (inside the ebuild), and are specified (or not) for every architecture.

A keyword can effectively have one of four states:

  1. stable (e.g. amd64), indicating that the package should be good to be used on production;
  2. testing (often called ~arch, e.g. ~amd64), indicating that the package should work, but we don’t give strong guarantees;
  3. unkeyworded (i.e. no keyword for given architecture is present), usually indicating that the package has not been tested yet;
  4. disabled (e.g. -amd64), indicating that the package can’t work on given architecture. This is rarely used, usually for prebuilt software.

Now, the key point is that users have control over which keywords their package managers accepts. If you’re running a production system, you may want to set it to accept stable keywords only — in which case only stable packages will normally be allowed to be installed, and your packages will only be upgraded once the next version is marked stable. Or you may set your system to accept both stable and testing keywords, and help us test them.

Of course, this is not just a binary global switch. At the cost of increased risk and reduced chances of getting support, you can adjust allowed keywords for packages, and run a mix of stable and testing. Or you can install some packages that has no keywords at all, including live packages built straight from a VCS repository. Or you can even set your system to follow keywords for another architecture — the sky is the limit!

Note that not all Gentoo architectures use stable keywords at a time. There are so called “pure ~arch arches” that use testing keywords only. An examples of such architectures are alpha, loong and riscv.

Bad terminology: stable and stable

Here’s a time for a short intermezzo: as you may have noticed, we have used the term “stable” twice already: one time for profiles, and the other time for the keywords. Combined with the fact that not all architectures actually use stable keywords, this can get really confusing. Unfortunately, it’s a historical legacy that we have to live with.

So to clarify. A stable profile is a profile that should be good to use on production systems. A stable package (i.e. a package [version] with stable keywords) is a package version that should be good to use on production systems.

However, the two aren’t necessarily linked. You can use a dev or even exp profile, but only accept stable keywords, and the other way around. Furthermore, architectures that don’t use stable keywords at all, do have stable profiles.

Visibility and dependency graph integrity

Equipped with all that information, now we can introduce the concept of package visibility. Long story short, a package (version) is visible if it is installable on a given system. The primary reasons why a package couldn’t be installed are insufficient keywords, or an explicit mask. Let’s consider these cases in detail.

As I’ve mentioned earlier, a particular system can be configured to accept either stable, or both stable and testing keywords. Therefore, on a system set to accept stable keywords, only packages featuring stable keywords can be visible (the remaining packages are masked by “missing keyword”). On a system set to accept both stable and testing keywords, all packages featuring either stable or testing keywords can be visible.

Additionally, packages can be explicitly masked either globally in the repository, or in profiles. These masks are used for a variety of reasons: when a particular package is incompatible with the configuration of a given profile (say, 32-bit packages on a non-multilib 64-bit profile), when it turns out to be broken or when we believe that it needs more testing before we let users install it (even on testing-keyword systems).

The considerations of package visibility here are limited to the package itself. However, in order for the package to be installable, all its dependencies need to be installable as well. For packages with stable keywords, this means that all their dependencies (including optional dependencies conditional to USE flags that can be enabled on a stable system) have a matching version with stable keywords as well. Conversely, for packages with testing keywords, this means that all dependencies need to have either stable or testing keywords. Furthermore, said dependency versions must not be masked on any profile, on which the package in question is visible.

This is precisely what dependency graph integrity checks are all about. They are performed for all profiles that are either stable or dev (i.e. exp profiles are excluded, and don’t guarantee integrity), for all package versions with stable or testing keywords — and for each of these kind of keywords separately. And when integrity is not maintained, we get automated reports about it, and deployment pipeline is blocked, so ideally users don’t have to experience the problem firsthand.

The life of a keyword

Now that we have all the fundamental ideas covered, we can start discussing how packages get their keywords in the first place.

The default state for a keyword is “unspecified”. For a package to gain a testing keyword, it needs to be tested on the architecture in question. This can either be done by a developer directly, or via a keywording request filed on Gentoo Bugzilla, that will be processed by an arch tester. Usually, only the newest version of the package is handled, but in special circumstances testing keywords can be added to older versions as well (e.g. when required to satisfy a dependency). Any dependencies that are lacking a matching keyword need to be tested as well.

And what does happen if the package does not pass testing? Ideally, we file a bug upstream and get it fixed. But realistically, we can’t always manage that. Sometimes the bug remains open for quite some time, waiting for someone to take action or for a new release that might happen to start working. Sometimes we decide that keywording a particular package at the time is not worth the effort — and if it is required as an optional dependency of something else, we instead mask the relevant USE flags in the profiles corresponding to the given architecture. In extreme cases, we may actually add a negative -arch flag, to indicate that the package can’t work on given architecture. However, this is really rare and we generally do it only as a hint if people spend their time trying to keyword it over and over again.

Once a package gains a testing keyword, it “sticks”. Whenever a new version is added, all the keywords from the previous version are copied into it, and stable keywords are lowered into testing keywords. This is done even though the developer only tested it on one of the architectures. Packages generally lose testing keywords only if we either have a justified suspicion that they have stopped working, or if they gained new dependencies that are lacking the keywords in question. Most of the time, we request readding the testing keywords (rekeywording) immediately afterwards.

Now, stable requests follow a stricter routine. The maintainer must decide that a particular package version is ready to become stable first. A rule of thumb is that it’s been in testing for a month, and no major regressions have been reported. However, the exact details differ. For example, some projects make separate “stable branch” and “testing branch” releases, and we mark only the former stable. And when vulnerabilities are found in software, we tend to proceed with adding stable keywords to the fixed versions immediately.

Then, a stabilization request is filed, and then the package is tested on every architecture before the respective stable keyword is added. Testing is generally done on a system that is set only to accept stable keywords, therefore it may provide a slightly different environment that the original testing done when the package was added. Note that there is an exception to that rule — if we believe that particular packages are unlikely to exhibit different behavior across different architectures, we do ALLARCHES stabilization and add all the requested stable keywords after testing on one system.

Unlike with testing keywords, stable keywords need to be added to every version separately. When a new package version is added, all stable keywords in it are replaced by the corresponding testing keywords.

This process pretty much explains the difference between the guarantees given by testing and stable keywords. The testing keywords indicate that some version of the package has been tested on the given architecture at some point, and that we have good reasons to believe that it still works. The stable keywords indicate that this particular version has been tested on a system running stable keywords, and therefore it is less likely to turn out broken. Unfortunately, whether it actually is free of bugs is largely dependent on the quality of test suites, dependencies and so on. So yeah, it’s a mess.

The cost of keywords

I suppose that from user’s perspective it would be best if all packages that work on a given architecture had keywords for it; and ideally, all versions suitable for it would have stable keywords on all relevant architectures. However, every keyword comes with a cost. And that’s not only the cost of actual testing, but also a long-term maintenance cost.

For the most important architectures, Gentoo developers have access to one or more dedicated machines. These machines are used to various purposes: arch testing (i.e. processing keywording and stabilization requests, usually semi-automated), building stage archives, building binary packages, and last but not least: providing development environments that are needed to debug and fix bugs. For other architectures, we are entirely dependent on volunteers doing the testing — a few prominent volunteers worthy of the highest praise, I must add.

The cost incurred by testing keywords is comparatively small, but contrary to what you might think, it’s not a one time cost. Once a package gains a testing keyword, we generally want to keep it going forward. This means that if it gains new dependencies, we’re going to have to retest it — and its new dependencies. However, that’s the easy part.

The hard part is that stuff can actually break over time. The package itself can start exhibiting test failures, or stop working entirely. Its new dependencies may turn out to be broken on the architecture in question. In these cases, it’s not just the cost of testing — but actually reporting bugs, and possibly debugging and writing patches when upstream authors don’t have access to the relevant hardware (and/or don’t care). Sometimes you even learn that the author never intended to support given architecture, and is unwilling to accept well-written patches.

And if it turns out that it really isn’t feasible to keep the keyword going forward anymore, sometimes removing it may also turn out to be a lot of effort — especially if multiple packages depending on this one have been keyworded as well.

Of course, the cost for stable keywords is much higher. After all, it’s no longer a case of one time testing, but we actually have to test every single version that’s going stable. This is somewhat amortized by ALLARCHES packages that need to be tested on a single architecture only (and therefore usually are tested on one of the “fast” architectures), but still it’s a lot. On top of that, frequent testing is more likely to reveal problems, and therefore require immediate fixes. This is actually a good thing, but also a future cost to consider. And removing keywords from packages that used to be stable is likely to have greater impact than from these that never were.

Struggling architectures

All the costs considered, it shouldn’t come as a surprise that we sometimes find ourselves struggling with some of the less popular architectures. We may have limited access to hardware, the hardware itself may not be very performant, the hardware and the operating system may be susceptible to breakage. So if we keyword too much, then the arch teams can no longer keep up, the queue is getting long, and requests aren’t handled timely. In the extreme case, we may lose the last machine for a given architecture and become stuck, unable to go forward. These are all things to consider.

For these reasons, we periodically discuss the state of architectures in Gentoo. If we determine that some of them are finding it hard to cope, we look for solutions. Of course, one possibility to weigh in is getting more hardware — but that’s not always justified, or even possible. Sometimes we need to actually reduce the workload.

For architectures that use stable keywords, the obvious possibility is to reduce the number of packages using them — i.e. destabilize packages. Ordinarily, the best targets for this effort would be packages that are old, particularly problematic or unpopular, as they can reduce our effective maintenance cost while minimizing the potential discomfort to users. However, we might need to go deeper than that. In extreme cases, we can go as far as to reduce the stable package set to core system packages. At some point, this kind of reduction forces users to run a mixed stable-testing keyword system, but that at least permits them to limit risk of regressions in the most important packages.

If even that is insufficient, there are more options at our disposal. We can look into removing keywords entirely from packages, particularly packages that require further rekeywording work. We can decide to remove stable keywords from an architecture entirely. In the worst case, we can decide to mark all profiles exp, effectively abandoning dependency graph integrity (at this point, some dependencies may start missing keywords and packages may not be trivially installable), or we can decide to remove the support for a given architecture entirely.

Summary

Gentoo uses a combined profile and keyword system to facilitate user needs on top of a single ebuild repository. This is in contrast with many other distributions that use multiple repositories, make releases, sometimes maintain multiple release branches simultaneously. In fact, some distributions actually split into multiple versions to facilitate different user profiles. Gentoo does all that in a single, coherent product with rolling releases and profile upgrade paths.

The system of keywords is aimed at providing good user experience while keeping the maintenance affordable. On most of the supported architectures, we provide stable keywords to help keeping production systems on reasonably tested software. Before packages becomes stable, we offer them to more adventurous users via testing keywords. Gentoo also offers great flexibility — users can mix stable and testing keywords freely (though at the risk of hitting unexpected issues), or run experimental packages that aren’t ready to get testing keywords yet.

Unfortunately, there are limits to how much support for various architectures we can provide. We are largely reliant on either having appropriate machines available, or volunteers with the hardware to test stuff for us, not to mention developers having skills and energy to debug and fix architecture-specific problems. Sometimes this turns out to be insufficient to cope with all the work, and we need to give up on some of the architecture support.

Still, I think the system works pretty well here, and it is one of Gentoo’s strong suits. Sure, it occasionally needs a push here and there, or a policy change, but it’s been one of Gentoo’s foundations for years, and it doesn’t look as if it’s going to be replaced anytime soon.

Leave a Reply

Your email address will not be published.