How LINGUAS are thrice wrong!

The LINGUAS environment variable serves two purposes in Gentoo. On one hand, it’s the USE_EXPAND flag group for USE flags controlling installation of localizations. On the other, it’s a gettext-specfic environment variable controlling installation of localizations in some of build systems supporting gettext. Fun fact is, both uses are simply wrong.

Why LINGUAS as an environment variable is wrong?

Let’s start with the upstream-blessed LINGUAS environment variable. If set, it limits localization files installed by autotools+gettext-based build systems (and some more) to the subset matching specified locales. At first, it may sound like a useful feature. However, it is an implicit feature, and therefore one causing a lot of confusion for the package manager.

Long story short, in this context the package manager does not know anything about LINGUAS. It’s just a random environment variable, that has some value and possibly may be saved somewhere in package metadata. However, this value can actually affect the installed files in a hardly predictable way. So, even if package managers actually added some special meaning to LINGUAS (which would be non-PMS compliant), it still would not be good enough.

What does this practically mean? It means that if I set LINGUAS to some value on my system, then most of the binary packages produced by it suddenly have files stripped, as compared to non-LINGUAS builds. If I installed the binary package on some other system, it would match the LINGUAS of build host rather than the install host. And this means the binary packages are simply incorrect.

Even worse, any change to LINGUAS can not be propagated correctly. Even if the package manager decided to rebuild packages based on changes in LINGUAS, it has no way of knowing which locales were supported by a package, and if LINGUAS was used at all. So you end up rebuilding all installed packages, just in case.

Why LINGUAS USE flags are wrong?

So, how do we solve all those problems? Of course, we introduce explicit LINGUAS flags. This way, the developer is expected to list all supported locales in IUSE, the package manager can determine the enabled localizations and match binary packages correctly. All seems fine. Except, there are two problems.

The first problem is that it is cumbersome. Figuring out supported localizations and adding a dozen flags on a number of packages is time-consuming. What’s even worse, those flags need to be maintained once added. Which means you have to check supported localizations for changes on every version bump. Not all developers do that.

The second problem is that it is… a QA violation, most of the time. We already have quite a clear policy that USE flags are not supposed to control installation of small files with no explicit dependencies — and most of the localization files are exactly that!

Let me remind you why we have that policy. There are two reasons: rebuilds and binary packages.

Rebuilds are bad because every time you change LINGUAS, you end up rebuilding relevant packages, and those can be huge. You may think it uncommon — but just imagine you’ve finally finished building your new shiny Gentoo install, and noticed that you forgot to enable the localization. And guess what! You have to build a number of packages, again.

Binary packages are even worse since they are tied to a specific USE flag combination. If you build a binary package with specific LINGUAS, it can only be installed on hosts with exactly the same LINGUAS. While it would be trivial to strip localizations from installed binary package, you have to build a fresh one. And with dozen lingua-flags… you end up having thousands of possible binary package variants, if not more.

Why EAPI 5 makes things worse… or better?

Reusing the LINGUAS name for the USE_EXPAND group looked like a good idea. After all, the value would end up in ebuild environment for use by the build system, and in most of the affected packages, LINGUAS worked out of the box with no ebuild changes! Except that… it wasn’t really guaranteed to before EAPI 5.

In earlier EAPIs, LINGUAS could contain pretty much anything, since no special behavior was reserved for it. However, starting with EAPI 5 the package manager guarantees that it will only contain those values that correspond to enabled flags. This is a good thing, after all, since it finally makes LINGUAS work reliably. It has one side effect though.

Since LINGUAS is reduced to enabled USE flags, and enabled USE flags can only contain defined USE flags… it means that in any ebuild missing LINGUAS flags, LINGUAS should be effectively empty (yes, I know Portage does not do that currently, and it is a bug in Portage). To make things worse, this means set to an empty value rather than unset. In other words, disabling localization completely.

This way, a small implicit QA issue of implicitly affecting installed localization files turned out into a bigger issue of suddenly stopping to install localizations. Which in turn can’t be fixed without introducing proper set of LINGUAS everywhere, causing other kind of QA issues and additional maintenance burden.

What would be the good solution, again?

First of all, kill LINGUAS. Either make it unset for good (and this won’t be easy since PMS kinda implies making all PM-defined variables read-only), or disable any special behavior associated with it. Make the build system compile and install all localizations.

Then, use INSTALL_MASK. It’s designed to handle this. It strips files from installed systems while preserving them in binary packages. Which means your binary packages are now more portable, and every system you install them to will get correct localizations stripped. Isn’t that way better than rebuilding things?

Now, is that going to happen? I doubt it. People are rather going to focus on claiming that buggy Portage behavior was good, that QA issues are fine as long as I can strip some small files from my system in the ‘obvious’ way, that the specification should be changed to allow a corner case…

7 thoughts on “How LINGUAS are thrice wrong!”

  1. Instead of recompiling – how can I preserve the LINGUAS settings just for the current installed packages in package.use ?

    1. Sry, I meant the above as a hint to add an example to that, eg.:
      echo “www-client/firefox-45.1.0 linguas_en_GB” >> /etc/portage/package.use/linguas

  2. If INSTALL_MASK can be used to say “don’t install locales except X, Y, Z”, then fine but as far as I know INSTALL_MASK is a list of items to strip without ability to white-list a few files within a larger blacklisted set.

    Otherwise end-user has to monitor installed locales and update INSTALL_MASK every time.

    If LINGUAS was converted to proper INSTALL_MASK by package-manager then user could have a positive list of locales to install and package-manager would do the right thing on install without stripping locales from binary packages.

    Note that a few ebuilds do fail in subtle ways depending on files/subtrees removed by INSTALL_MASK.
    If some kind of install_flags existed that would operate like maintainer-operated INSTAL_MASK it would mostly kill the need for split-packages or excessive runtime dependencies doe to some installed but not used helper tool or example script.
    A few such flags that would make embedded install easier would be things like `libs tools/utils server client devel debug`, kind of what binary distributions do with -libs -devel, … “subpackages”.

    1. This was already proposed. In fact, I even tried writing a spec for it but found no interest among developers. If someone is willing to implement it in Portage, I would be glad to help.

        1. That covers part of it.

          The part not covered by your INSTALL_MASK GLEP is tying runtime dependencies to such file-mask groups.

          There are a few packages that pull whole trees of dependencies just for some helper scripts (Pear, Python, …) that it would be nice to get handled so INSTALL_MASKing those files should also remove the runtime dependency (yes, this would need explicit support from respective ebuilds).

          Package-local INSTALL_MASK path-groups and reference to those from DEPEND, RDEPEND (as for USE flags) should be possible.

          I could imagine having ${CATEGORY}/${PN}/install-mask.conf where the group names would only be available for that specific package and be referred to in context of the package as ‘pkg-${GROUP}’ (that is “pkg-” prefix would be prohibited for glocal groups and represent a namespace).
          That way a package could define /path/to/perl-script as “perl” group and refer to it from ebuild with RDEPEND=”… @pkg-perl? ( dev-lang/perl ) …” (proper syntax to be though about).
          The user could list those in per-package INSTALL_MASK in package manager configuration.

  3. It would be nice to actually support it properly. Why not use a LINGUAS-like-variable to have a package-informed install mask. Let the maintainer worry about where language related files are instead of the user. Of course sane defaults for “well-behaved” packages are important.

    Of course in some cases (such as libreoffice) a lingua means downloading quite a lot of translation data (all together hundreds of megabytes or even gigabytes). In those cases a use flag would be the right thing. But an ebuild controlled default package mask (of course overrideable with the usual tricks) would be worthwhile, not only for this case.

Comments are closed.