The future of Python build systems and Gentoo

Anyone following my Twitter could have seen me complaining about things happening around Python build systems frequently. The late changes feel like people around the Python packaging ecosystem have been strongly focused on building a new infrastructure focused on Python-specific package manages such as pip and flit. Unfortunately, there seems to be very little concern on distribution packagers or backwards compatibility in this process.

In this post, I’d like to discuss how the Python packaging changes are going to affect Gentoo, and what is my suggested plan on dealing with them. In particular, I’d like to focus on three important changes:

  1. Python upstream deprecating the distutils module (and build system), and planning to remove it in Python 3.12.
  2. The overall rise of PEP 517-based build systems and the potential for setuptools dropping UI entirely.
  3. Setuptools upstream deprecating the setup.py install command, and potentially removing it in the future.

distutils deprecation

Over the years, the distutils stdlib module has been used to build setup.py scripts for Python packages. In addition to the baseline functions providing a build system CLI for the package, it provided the ability to easily extend the build system. This led both to growth of heavily customized setup.py scripts as part of some packages, as well as third-party build systems based on distutils, most notably setuptools.

This eventually led to deprecation of distutils themselves (see: PEP 632). Python 3.10 is already warning of distutils deprecation, and the current plan is to remove it in Python 3.12. Ahead of that, the development has moved to a dedicated pypa/distutils repository, and the copy of that is bundled within setuptools.

setuptools still uses the stdlib distutils by default. However, some packages already switch to the bundled copy, and upstream plans on using it by default in the future (see: Porting from Distutils).

At this point, I don’t think there is an explicit need for Gentoo to act here. However, it seems reasonable to avoid using distutils as the build system for Gentoo projects. Since the setuptools copy of distutils is different from the one included in CPython (and PyPy) and at the moment it does not carry the full set of historical Gentoo patches, it probably makes sense to test package compatibility with it nevertheless.

The use of bundled distutils copy can be forced using the following environment variable:

SETUPTOOLS_USE_DISTUTILS=local

This can be set both in the specific ebuild or in make.conf to force globally. However, please note that you can’t change the variable in place without a version bump (revision bump is insufficient). This is because switching to the local variant involves replacing the .egg-info file with a directory that is not supported by the PMS and not handled well by Portage.

Presuming that upstream is going to change the default sooner than later (and therefore unleash the breakage upon us), I think the cleanest way forward is to:

  1. Perform some initial testing (via tinderboxes).
  2. Enable SETUPTOOLS_USE_DISTUTILS=local when DISTUTILS_USE_SETUPTOOLS!=no (variable name similarity is coincidental) via eclass.
  3. Deprecate DISTUTILS_USE_SETUPTOOLS=no, requesting maintainers to switch when bumping packages to new versions.

The purpose of this plan is to have a good chance of testing the new default and migrating as many packages as possible before upstream forces it in place. The change of distutils provider on packages already using setuptools should be relatively safe. On the other hand, for packages using pure distutils it should happen through version bumps, in order to avoid file-directory collisions mentioned before. At the same time, the change of DISTUTILS_USE_SETUPTOOLS value will be necessary since setuptools dependency will now be necessary to provide the distutils override.

I have requested the initial tinderbox testing already. If everything goes good and we decide to follow with the plan, I will provide detailed instructions later. Please do not update the ebuilds yet.

The rise of PEP 517

PEP 517 (and a few more related PEPs) define a new infrastructure for installing Python packages. Long story short, they define a consistent API that can be exposed by an arbitrary build system to support using it from any package manager. Sounds great, right? Well, I’m not that enthusiastic.

Before I get to my reasons, let’s shortly summarize how building packages is supposed to work in PEP 517 world. Every project supplies at least a minimal pyproject.toml file that specifies the package providing the build system and the path to a module providing its entry points. You read that file, install the necessary packages, then call the appropriate entry point to get a wheel. Then you install the wheel. Roughly.

Firstly, TOML. This is something I’ve been repeating for quite some time already, so I’ll just quickly go over it. I like TOML, I think it’s a reasonable choice for markup. However, without a TOML parser in stdlib (and there’s no progress in providing one), this means that every single build system now depends on tomli, and involves a circular dependency. A few months back, every single build system depended on toml instead but that package became unmaintained. Does that make you feel confident?

Secondly, customization. We do pretty heavy customization of distutils/setuptools behavior at this point — build paths, install paths, the toolchain. It is understandable that PEP 517 utilizes the black box approach and doesn’t attempt to do it all. Unfortunately, the build systems built on top of PEP 517 so far seem to focus on providing an all-in-one package manager rather than a good build tool with customization support.

Thirdly, wheels. PEP 517 pretty much forces everyone into using the wheel package format, completely ignoring the fact that it’s neither the simplest solution, nor a good fit for distributions. What we lack is a trivial “put all files into a directory” entry point. What we get instead if “pack everything into a zip, and then use the next tool to unzip them”. Sure, that’s not a big deal for most packages but I just hate the idea of wasting electricity and user’s time to compress something just so it gets uncompressed back afterwards.

PEP 660 gives some hope of avoiding that by providing “editable install” support. Unfortunately, it’s so bleak it practically doesn’t specify anything. In practice, a PEP 660 editable install is usually a .dist-info + .pth file that adds source directory to sys.path — which means no files are actually installed, and it does not make it any easier for us to find the right files to install. In other words, it’s completely useless.

I have spent significant time looking for a good solution and found none so far. Back in the day, I wrote pyproject2setuppy as a stop-gap solution to install PEP 517-based packages via setuptools without having to package the new build systems (including their NIH dependencies) and figure out how to make them work sanely within our package framework. As of today, I still don’t see a better solution.

Given that setuptools seems to be aiming towards removing the CLI entirely and distutils is no longer maintained, I suspect that it is inevitable that at some point we’re going to have to bite the bullet one way or another. However, I don’t plan on making any changes for the time being — as long as setup.py install continues working, that is. When this is no longer feasible, we can research our options again.

setup.py install deprecation

At last, the final event that puts everything else into perspective: the setuptools upstream has deprecated the install command. While normally I would say “it’s not going to be removed anytime soon”, the indiscriminate use_2to3 removal suggests otherwise.

Just a quick recap: setuptools removed the use_2to3 support after it being deprecated for some time, summarizing it with “projects should port to a unified codebase or pin to an older version of Setuptools”. Surely, nose, a project that hasn’t seen a single commit (or accepted user patches) since 2016 is going to suddenly make a release to fix this. In the end, all the breakage is dumped on distribution packagers.

The install command removal is a bigger deal than that. It’s not just few old packages being broken, it’s whole workflows. I’ve been considering switching Gentoo to a different workflow for some time, without much effect. Even if we bite the bullet and go full PEP 517, there’s another major problem: there are projects that override the install command.

I mean, if we indiscriminately switched to installing without the install command, some packages would effectively be broken silently — they would e.g. stop installing some files. The biggest issue is that it’s non-trivial to find such packages. One I know about is called Portage.

At this point, I don’t think it’s worthwhile to put our effort into finding a replacement for setup.py install. We can cross that bridge when we get to it. Until then, it seems an unnecessary work with a fair breakage potential.

In the end, it’s still unclear what would be the best solution. It is possible we’re going to continue converting flit and poetry into setuptools to avoid having to maintain support for multiple build processes. It is possible we’re going to hack on top of existing PEP 517 tooling, or build something or own. It’s quite probable that if I find no other solution, I’m going to try monkey-patching the build system to copy files instead of zipping them, or at least disable compression.

Summary

The Python ecosystem is changing constantly, and the packaging aspect of it is no different. The original distutils build system has eventually evolved into setuptools, and is now being subsumed by it. Setuptools seems to be moving in the direction of becoming yet another PEP 517 build backend and indiscriminately removing features.

Unfortunately, this is all happening without much of a concern for backwards compatibility or feature parity. The Python developers are focused on building their own packaging infrastructure and have no interest in providing a single good workflow for distribution packagers. It is really unfortunate given that many of them rely on our work to build the environments they use to work.

At this point, our immediate goal is to get ready for distutils removal and the setuptools switch to the bundled distutils copy. This switch has real breakage potential for Gentoo users (because of the egg-info file/directory collision), and we need to handle the migration gracefully ahead of time. The other issues. notably setup.py install removal will also need to be handled in the future but right now the gain does not justify the effort.

Update (2021-11-10): data file support

While writing this post, I have missed an important limitation of PEP 517 builds. Distutils and setuptools both have a data_files feature that can be used to install arbitrary files into the system — either into subdirectories of sys.prefix (i.e. /usr) or via absolute paths. This was often used to install data files for the package but also to install manpages, .desktop files and so on.

The wheel specification as of today simply doesn’t support installing files outside the few Python-specific directories. Setuptools/wheel/pip seem to include them in wheels but it’s outside the specification and therefore likely to suffer from portability problems.

Unfortunately, there doesn’t seem to be an interest to actually resolve this. Unless I’m mistaken, both flit and poetry do not support installing files outside standard Python directories.

Update (2022-01-24): PEP 517 in Gentoo

Just a small update: due to the uncontrolled multiplication of new build systems, with every single one of them aiming to be a proper XKCD#927 standard, I have decided to discontinue pyproject2setuppy. Having to replicate all the hacks and add new test cases for them was humongous amount of work. Gentoo is now switching to building its packages using PEP 517 entry points.

8 thoughts on “The future of Python build systems and Gentoo”

  1. The quick change of versions in python is a problem.
    Packages cannot follow all those changes, Users get the creeps of it.
    instead of rapid change from one version to another, would it be better, wait until a stable version is available for some time.
    We changed in a year from 2.6 to 3.5,3.6,3.7,3.8,3.9,3.10, and now again to 3.12 ? Why not wait until version 4.0 instead at this rate, it would be end of the year ;-)
    Focus on getting packages working instead of versions of python please ! A lot of packages are in yellow or masked.
    I constantly ave packages that work, than not anymore , then again when manually tweaked.

  2. As a user, it angers me that many programming languages try to sidestep distro maintainers with their own distribution systems (cargo, pip, npm) instead of just playing nice with the (traditionally C/C++) package managers like perl and others do.
    https://imgs.xkcd.com/comics/standards.png

  3. I’m an Arch Linux TU and package some python stuff in the community repository. I’m not primarily a Python developer so I have mostly just observed, but I’ve seen discussions about all the same problems and felt the same exasperation as things get deprecated before they have feasible replacements. I know there is at least one Arch guy working on new install tooling for PEP517, but not sure what the stage is at. It might be good to get some cross-distro collaboration going outside of the Python upstream issue trackers where I have noticed this is pretty much discarded every time it comes up. I’m sure *most* distros are affected and re-inventing the wheel\* at this point.

    \* Sorry about the pun.

  4. It’s a disappointment to see Python evolve into this beast it’s becoming. One of the most accessible languages, with easy to read syntax and nice community support has one of the worst packaging systems I’ve seen.

    As previously pointed out, it seems that it’s evolving for the sake of evolving, not considering backwards compatibility. Python 2 -> 3 was already a painful experience, albeit a necessary one as the language matured. But now every version is bringing breaking features. Every script you write you have to consider the target Python version.

    And then packaging seems to have an expiration date now. It saddens me. Suddenly Bash and Perl doesn’t seem that ugly anymore.

  5. Thanks for your clear, logical thoughts. That’s very nice to read from the perspective of a Python library developer struggling with the same problems and having considerable doubts about the rationale behind these packaging changes.
    I’m particularly concerned about the rise of PEP 517 and the deprecation of the setup.py commands. It seems like Python upstream is not really aware what consequences these compatibility-breaking changes will have for packagers and certain library developers; how much work it will mean to restructure sensitive setup architectures, if possible at all.
    The misssing flexibility of the new setup structure is a major issue for my purposes, in conjunction with the lack of coherent documentation. For some background, I’m maintaining `pypdfium2`, which basically is a ctypes-based Python binding to PDFium, Chromium’s PDF rendering library. The problem is that PDFium is incredibly hard to build from source since it requires static linking and pinned dependencies, so I’m using a foreign binary provider (pdfium-binaries) for the wheels I upload to PyPI. This means that my packaging infrastructure is inherently complex, as I need multiple setup files for different platforms, each calling a shared method to build a wheel for a certain platform. It seems that this is not possible anymore with the new `pyproject.toml`-based structure, at least not in the nice way how I can do it currently, which makes me somewhat worried about the future of the project, should upstream decide to suddenly remove the setup.py commands and enforce use of a PEP 517-compliant build system.

    1. By the way, if you’d like to package pypdfium2 in Gentoo, I’ll be happy to provide support for doing so. It would require packaging PDFium, changing the Chromium build to dynamically link against the distribution-provided PDFium, and then add a custom setup file to pypdfium2 for including bindings created from the system-provided headers.

Leave a Reply

Your email address will not be published.