The LINGUAS environment variable serves two purposes in Gentoo. On one hand, it’s the USE_EXPAND flag group for USE flags controlling installation of localizations. On the other, it’s a gettext-specfic environment variable controlling installation of localizations in some of build systems supporting gettext. Fun fact is, both uses are simply wrong.
Why LINGUAS as an environment variable is wrong?
Let’s start with the upstream-blessed LINGUAS environment variable. If set, it limits localization files installed by autotools+gettext-based build systems (and some more) to the subset matching specified locales. At first, it may sound like a useful feature. However, it is an implicit feature, and therefore one causing a lot of confusion for the package manager.
Long story short, in this context the package manager does not know anything about LINGUAS. It’s just a random environment variable, that has some value and possibly may be saved somewhere in package metadata. However, this value can actually affect the installed files in a hardly predictable way. So, even if package managers actually added some special meaning to LINGUAS (which would be non-PMS compliant), it still would not be good enough.
What does this practically mean? It means that if I set LINGUAS to some value on my system, then most of the binary packages produced by it suddenly have files stripped, as compared to non-LINGUAS builds. If I installed the binary package on some other system, it would match the LINGUAS of build host rather than the install host. And this means the binary packages are simply incorrect.
Even worse, any change to LINGUAS can not be propagated correctly. Even if the package manager decided to rebuild packages based on changes in LINGUAS, it has no way of knowing which locales were supported by a package, and if LINGUAS was used at all. So you end up rebuilding all installed packages, just in case.
Why LINGUAS USE flags are wrong?
So, how do we solve all those problems? Of course, we introduce explicit LINGUAS flags. This way, the developer is expected to list all supported locales in IUSE, the package manager can determine the enabled localizations and match binary packages correctly. All seems fine. Except, there are two problems.
The first problem is that it is cumbersome. Figuring out supported localizations and adding a dozen flags on a number of packages is time-consuming. What’s even worse, those flags need to be maintained once added. Which means you have to check supported localizations for changes on every version bump. Not all developers do that.
The second problem is that it is… a QA violation, most of the time. We already have quite a clear policy that USE flags are not supposed to control installation of small files with no explicit dependencies — and most of the localization files are exactly that!
Let me remind you why we have that policy. There are two reasons: rebuilds and binary packages.
Rebuilds are bad because every time you change LINGUAS, you end up rebuilding relevant packages, and those can be huge. You may think it uncommon — but just imagine you’ve finally finished building your new shiny Gentoo install, and noticed that you forgot to enable the localization. And guess what! You have to build a number of packages, again.
Binary packages are even worse since they are tied to a specific USE flag combination. If you build a binary package with specific LINGUAS, it can only be installed on hosts with exactly the same LINGUAS. While it would be trivial to strip localizations from installed binary package, you have to build a fresh one. And with dozen lingua-flags… you end up having thousands of possible binary package variants, if not more.
Why EAPI 5 makes things worse… or better?
Reusing the LINGUAS name for the USE_EXPAND group looked like a good idea. After all, the value would end up in ebuild environment for use by the build system, and in most of the affected packages, LINGUAS worked out of the box with no ebuild changes! Except that… it wasn’t really guaranteed to before EAPI 5.
In earlier EAPIs, LINGUAS could contain pretty much anything, since no special behavior was reserved for it. However, starting with EAPI 5 the package manager guarantees that it will only contain those values that correspond to enabled flags. This is a good thing, after all, since it finally makes LINGUAS work reliably. It has one side effect though.
Since LINGUAS is reduced to enabled USE flags, and enabled USE flags can only contain defined USE flags… it means that in any ebuild missing LINGUAS flags, LINGUAS should be effectively empty (yes, I know Portage does not do that currently, and it is a bug in Portage). To make things worse, this means set to an empty value rather than unset. In other words, disabling localization completely.
This way, a small implicit QA issue of implicitly affecting installed localization files turned out into a bigger issue of suddenly stopping to install localizations. Which in turn can’t be fixed without introducing proper set of LINGUAS everywhere, causing other kind of QA issues and additional maintenance burden.
What would be the good solution, again?
First of all, kill LINGUAS. Either make it unset for good (and this won’t be easy since PMS kinda implies making all PM-defined variables read-only), or disable any special behavior associated with it. Make the build system compile and install all localizations.
Then, use INSTALL_MASK. It’s designed to handle this. It strips files from installed systems while preserving them in binary packages. Which means your binary packages are now more portable, and every system you install them to will get correct localizations stripped. Isn’t that way better than rebuilding things?
Now, is that going to happen? I doubt it. People are rather going to focus on claiming that buggy Portage behavior was good, that QA issues are fine as long as I can strip some small files from my system in the ‘obvious’ way, that the specification should be changed to allow a corner case…