The dead weight of packages in Gentoo

You’ve probably noticed it already: Gentoo developers are overwhelmed. There is a lot of unresolved bugs. There is a lot of unmaintained packages. There is a lot of open pull requests. This is all true, but it’s all part of a larger problem, and a problem that doesn’t affect Gentoo alone.

It’s a problem that any major project is going to face sooner or later, and especially a project that’s almost entirely relying on volunteer work. It’s a problem of bitrot, of different focus, of energy deficit. And it is a very hard problem to solve.

A lot of packages — a lot of effort

Packages are at the core of a Linux distribution. After all, what would be any of Gentoo’s advantages worth if people couldn’t actually use them to realize their goals? Some people even go as far as to say: the more packages, the better. Gentoo needs to have popular packages, because many users will want them. Gentoo needs to have unique packages, because that gives it an edge over other distributions.

However, having a lot of packages is also a curse. All packages require at least some maintenance effort. Some packages require very little of it, others require a lot. When packages aren’t being maintained properly, they stop serving users well. They become outdated, they accumulate bugs. Users spend time building dependencies just to discover that the package itself fails to build for months now. Users try different alternatives to discover that half of them don’t work at all, or perhaps are so outdated that they don’t actually have the functions upstream advertises, or even have data loss bugs.

Sometimes maintenance deficit is not that bad, but it usually is. Skipping every few releases of a frequently released package may have no ill effects, and save some work. Or it could mean that instead of dealing with trivial diffs (especially if upstream cared to make the changes atomic), you end up having to untangle a complex backlog. Or bisect bugs introduced a few releases ago. Or deal with an urgent security bump combined with major API changes.

If the demand for maintenance isn’t met for a long time, bitrot accumulates. And getting things going straight again becomes harder and harder. On top of that, if we can’t handle our current workload, how are we supposed to find energy to deal with all the backlog? Things quickly spiral out of control.

People want to do what they want to do

We all have packages that we find important. Sometimes, these packages require little maintenance, sometimes they are pain the ass. Sometimes, they end up being unmaintained, and we really wish someone would take care of them. Sometimes, we may end up going as far as to be angry that people are taking care of less important stuff, or that they keep adding new stuff while the existing packages rot.

The thing is, in a project that’s almost entirely driven by volunteer work, you can’t expect people to do what you want. The best you can achieve with that attitude is alienating them, and actively stopping them from doing anything. I’m not saying that there aren’t cases when this isn’t actually preferable but that’s beside the point. If you want something done, you either have to convince people to do it, do it yourself, or pay someone to do it. But even that might not suffice. People may agree with you, but not have the energy or time, or skills to do the work, or to review your work.

On top of that, there will always be an inevitable push towards adding new packages rather than dealing with abandoned ones. Users expect new software too. They don’t want to learn that Gentoo can’t have a single Matrix client, because we’re too busy keeping 20 old IRC clients alive. Or that they can’t have Fediverse software, because we’re overwhelmed with 30 minor window managers. And while this push is justified, it also means that the pile of unmaintained packages will still be there, and at the same time people will put effort into creating even more packages that may eventually end up on that pile.

The job market really sucks today

Perhaps it’s the nostalgia talking, but situation in the job market is getting worse and worse. As I’ve mentioned before, the vast majority of Gentoo developers and contributors are volunteers. They are people who generally need to work full-time to keep themselves alive. Perhaps they work overtime. Perhaps they work in toxic work places. Perhaps they are sucked dry out of their energy by problems. And they need to find time and energy to do Gentoo on top of that.

There are a handful of developers hired to do Gentoo. However, they are hired by corporations, and this obviously limits what they can do for Gentoo. To the best of my knowledge, there is no longer such a thing as “time to do random stuff in work time”. Their work can be beneficial to Gentoo users. Or it may not be. They may maintain important and useful packages, or they may end up adding lots of packages that they aren’t allowed to properly maintain afterwards, and that create extra work for others in the end.

Perhaps an option would be for Gentoo to actually pay someone to do stuff. However, this is a huge mess. Even provided that we’d be able to do afford it, how to choose what to pay for? And whom to pay? In the end, the necessary proceedings also require a lot of effort and energy, and the inevitable bikeshed is quite likely to drain it of anyone daring enough to try.

Proxy maintenance is not a long-term solution

Let’s be honest: proxy maintenance was supposed to make things better, but there’s only as much that it can do. In the end, someone needs to review stuff, and while it pays back greatly, it is more effort than “just doing it”. And there’s no guarantee that the contributor will respond timely, especially if we weren’t able to review stuff timely. Things can easily extend over time, or get stalled entirely, and that’s just one problem.

We’ve stopped accepting new packages via proxy-maint a long time ago, because we weren’t able to cope with it. I’ve created GURU to let people work together without being blocked by developers, but that’s not a perfect solution either.

And proxy-maint is just one facet of pull requests. Many pull requests are affecting packages maintaining by a variety of developers, and handling them is even harder, as they getting the developer to review or acknowledge the change.

So what is the long-term solution? Treecleaning?

I’m afraid it’s time to come to an unfortunate conclusion: the only real long-term solution is to keep removing packages. There’s only as many packages that we can maintain, and we need to make hard decisions. Keeping unmaintained and broken packages is bad for users. Spending effort fixing them ends up biting us back.

The joke is, most of the time it’s actually less effort to fix the immediate problem than to last rite and remove a package. Especially when someone already provided a fix. However, fixing the immediate issue doesn’t resolve the larger problem of the package being unmaintained. There will be another issue, and then another, and you will keep pouring energy into it.

Of course, things can get worse. You can actually pour all that energy into last rites, just to have someone “rescue” the package last minute. Just to leave it unmaintained afterwards, and then you end up going through the whole effort again. And don’t forget that in the end you’re the “villain” who wants to take away a precious package from the users, and they were the “hero” who saved it, and now the users have to deal with a back-and-forth. It’s a thankless job.

However, there’s one advantage to removing packages: they can be moved to GURU afterwards. They can have another shot at finding an active maintainer there. There, they can actually be easily made available to users without adding to developers’ workload. Of course, I’m not saying that GURU should be a dump for packages removed from Gentoo — but it’s a good choice if someone actually wants to maintain it afterwards.

So there is hope — but it is also a lot of effort. But perhaps that’s a better way to spend our energy than trying to deal with an endless influx of pull requests, and with developers adding tons of new packages that nobody will be able to take over afterwards.

12 thoughts on “The dead weight of packages in Gentoo”

  1. Michał,

    In real life, we called it the Zen effect, ………” the only real long-term solution is to keep removing packages. ”

    OR,
    Quote Ken Thompson, famously, throwing away stuff is more productive than blatantly adding them.

    Kudos to all the fellas who make it possible behind the scenes to let us ordinary mortals consume all the good stuff.

    One thing is very clear from the onset, and certainly no confusion, it is damn niche distribution, then why heck trying to be a general one?? It was never meant to be nor it will ever be. Period.

    So, the blatant effort to make it available to the general public does not make any sense. Is it for everyone?? No, it is not
    Nope, I am NOT plagued by stupid “elitism”; of open source, but has to have the facts clear.

    Please, the thought process has to be clear, and unfortunately, it is not. Why???

    Are we in a race, ever??? No, not a single second.

    PS: A schmuck’s (i.e. me) 2 cents, can’t resists.

  2. In some respects I think Gentoo is a victim of its own success – not in terms of numbers of users, but in terms of making portage more powerful. It appears to me that over the years function has migrated from the ebuilds to the eclasses. This is great for say the KDE devs, ‘cos most of their packages have almost nothing left in the ebuild apart from a name. But add that to portage’s increasing power (such as dependencies on packages AND use flags AND slots) means that writing and maintaining ebuilds has a potentially increased learning curve for the non-trivial cases. There’s a lot (too much) good, detailed information for devs, but I find it daunting.

    Guru is certainly good. It reflects the “core”, “extra” and “AUR” split of packages in Arch. I happened to be looking at libyui (the YAST User Interface library that gives a uniform API to ncurses and Qt), and notice it’s been last-rited, and thought perhaps it’s in Guru. So where’s the web page listing the packages in Guru? guru.gentoo.org doesn’t exist, and AFAICT there’s no info about it on the home page.

    An area that I think adds disproportionate dev work is the prevalence of OO languages with large libraries of small functions – Rust, Go, and to a lesser extent Java (Java packages tend to cover an entire subsystem). I long ago gave up the idea of installing Java packages from source, though as my main Java use is for developing my own code, I just download and run Eclipse. I get the feeling portage needs a different approach for these fine-grained package languages, based in some way on trusted object (or intermediate) code repositories, much as Eclipse has its own repositories for plugins. That way Gentoo devs get to duck some of the work. Ultimately we all have to trust some code repositories outside Gentoo (which Gentoo dev has read ALL the linux kernel code?) for source, so why not for this sort of repository?.

    1. Handle language package repositories directly within portage. That would be awesome, and cut down on the necessary work.
      portage hooking directly into CPAN, pip, cargo, whatever without needing to create an ebuild for every package out there.

      That’s my wet dream…

      Another one is using gentoo to build the entire debian .deb repository (yes: make portage spit out deb packages), but that’s borderline insane.

  3. I think making it easier for people to become Gentoo developers would definitely be a step in the right direction.

  4. These are all very good points, and that is why there are so many distributions out there. Is Debian any better than RedHat or Gentoo or Funtoo or FreeBSD? This as you say is unrelated to Gentoo, but the same is true in everyday life. Why are towns and cities putting new buildings up when there are vacant buildings that were recently leased, now empty, and still in great, like new condition? Perhaps it is the shiny new thing and our short attention span as human beings (is it 7 seconds now).

    How do you allocate resources in a project? How come Linux distributions don’t have a standard package format and instead every one has their own?

    Perhaps when we were younger, we had more free time to devote to learning and making something cool and independent of a large company and now we want to do other things.

    I’ve experienced this problem before with Gentoo, Funtoo, and most recently FreeBSD.

    My thoughts outlined directly:
    1. define a standard package format for as many Linux distributions as you can, share / pool resources
    2. automate where possible, for meta distributions that automatically create packages, you also need to define mechanisms to remove packages. Define what the automation looks like, perhaps it also means leveraging the crowd or community. In this day and age of containers and virtualization, we should be able to test more and automate more.
    3. define the entire lifecycle for the OS, where is automation, where is manual dev labor. In this day and age, instead of us cutting the grass with a mower, we’re instead servicing the mower to make sure it runs. Is that truly better than just cutting the grass in the first place or did we just make it a different problem we have to solve?

    1. Regarding point 1: how do you think we ended up with so many package formats in the first place? XKCD 927 is in full force here.

  5. Michał,

    I’m with goverp but possibly a bit more brutal.
    Once the last rite notice has been published, the package will be removed from the ::gentoo repo. The last rite notice tells of open bugs and nobody cared, so last minute rescues are no longer allowed.

    In the event of a last minute rescue, it can go to ::guru. Whatever, it does not stay in ::gentoo.
    If the new maintainer is responsive the package can go back to ::gentoo. If not, it dies.
    This avoids the last minute rescues and subsequent last rites, in ::gentoo.

  6. Na koniec, ale nie mniej ważne, naprawdę doceniam twoją pracę w Gentoo!
    (I don’t speak polish, but I can guess “twoją” means “your” and naprawdę “in truth” :)

  7. One possible solution for the proxy-maintainer problem is to let trusted people to merge their own PR.
    I maintain a few packages (and I am thankful for every interaction we had so far). I would like to do more, but it seems that the barrier to me becoming an official developer is still too high.
    Likewise, I can imagine that some proxy-maintainers are in the same limbo. So how about giving us the ability to merge our own stuff? Sure, it might hurt QA, but most want to keep good quality.
    This can also be reverted.
    GitHub allows fine grain control on who can merge PRs to what part of the tree.
    Alternatively, you can consider training and trusting more developers. Not every developer should have all permissions. Maybe ~100 or so developers is just too little, and the gentoo project should actively recruit more people.

Leave a Reply

Your email address will not be published.