The story of distutils build directory in Gentoo

The Python distutils build system, as well as setuptools (that it was later merged into), used a two-stage build: first, a build command would prepare a built package version (usually just copy the .py files, sometimes compile Python extensions) into a build directory, then an install command would copy them to the live filesystem, or a staging directory. Curious enough, distutils were an early adopter of out-of-source builds — when used right (which often enough wasn’t the case), no writes would occur in the source directory and all modifications would be done directly in the build directory.

Today, in the PEP517 era, two-stage builds aren’t really relevant anymore. Build systems were turned into black boxes that spew wheels. However, setuptools still internally uses the two-stage build and the build directory, and therefore it still remains relevant to Gentoo eclasses. In this post, I’d like to shortly tell how we dealt with it over the years.

Act 1: The first overrides

Normally, distutils would use a build directory of build/lib*, optionally suffixed for platform and Python version. This was reasonably good most of the time, but not good enough for us. On one hand, it didn’t properly distinguish CPython and PyPy (and it wouldn’t for a long time, until Use cache_tag in default build_platlib dir PR). On the other, the directory name would be hard to get, if ebuilds ever needed to do something about it (and we surely did).

Therefore, the eclass would start overriding build directories quite early on. We would start by passing --build-base to the build command, then add --build-lib to make the lib subdirectory path simpler, then replace it with separate --build-platlib and --build-purelib to workaround build systems overriding one of them (wxPython, if I recall correctly).

The eclass would class this mode “out-of-source build” and use a dedicated BUILD_DIR variable to refer to the dedicated build directory. Confusingly, “in-source build” would actually indicate a distutils-style out-of-source build in the default build subdirectory, and the eclass would create a separate copy of the sources for every Python target (effectively permitting in-source modifications).

The last version of code passing --build* options.

Act 2: .pydistutils.cfg

The big problem with the earlier approach is that you’d have to pass the options every time is invoked. Given the design of option passing in distutils, this effectively meant that you needed to repeatedly invoke the build commands (otherwise you couldn’t pass options to it).

The next step would be to replace this logic by using .pydistutils.cfg configuration file. The file, placed in HOME (also overridden in eclass) would allow us to set option values without actually having to pass specific commands on the command-line. The relevant logic, added in September 2013 (commit: Use pydistutils.cfg to set build-dirs instead of passing commands explicitly…), remains in the eclass even today. However, since the PEP517 build mode stopped using this file, it is used only in legacy mode.

The latest version of the code writing .pydistutils.cfg.

Act 3: Messy PEP517 mode

One of the changes caused by building in PEP517 mode was that .pydistutils.cfg started being ignored. This implied that setuptools were using the default build directory again. It wasn’t such a big deal anymore — since we no longer used proper separation between the two build stages, and we no longer needed to have any awareness of the intermediate build directory, the path didn’t matter per se. However, it meant CPython and PyPy started sharing the same build directory again — and since setuptools install stage picks everything up from that directory, it meant that extensions built for PyPy3.10 would be installed to CPython3.10 directory!

How did we deal with that? Well, at first I’ve tried calling clean -a. It was kinda ugly, especially that it meant combining calls with PEP517 invocations — but then, we were already calling build to take advantage of parallel build jobs when building extensions, and it worked. For a time.

Unfortunately, it turned out that some packages override the clean command and break our code, or even literally block calling it. So the next step was to stop being fancy and literally call rm -rf build. Well, this was ugly, but — again — it worked.

Act 4: Back to the config files

As I’ve mentioned before, we continued to call the build command in PEP517 mode, in order to enable building C extensions in parallel via the -j option. Over time, this code grew in complexity — we’ve replaced the call with more specific build_ext, then started adding heuristics to avoid calling it when unnecessary (a no-op build_ext call slowed pure Python package builds substantially).

Eventually, Eli Schwartz came up with a great alternative — using DIST_EXTRA_CONFIG to provide a configuration file. This meant that we could replace both invocations — by using the configuration file both to specify the job count for extension builds, and to use a dedicated build directory.

The change originally was done only for the explicit use of setuptools build backend. As a result, we’ve missed a bunch of “indirect” setuptools uses — other setuptools-backed PEP517 backends (jupyter-builder, pbr), backends using setuptools conditionally (pdm-backend), custom wrappers over setuptools and… dev-python/setuptools package itself (“standalone” backend). We’ve learned about it the hard way when setuptools stopped implicitly ignoring the build directory as a package name — and effectively a subsequent build collected a copy of the previous build as a build package. Yep, we’ve ended up with a monster of /usr/lib/python3.12/site-packages/build/lib/build/lib/setuptools.

So we approach the most recent change: enabling the config for all backends. After all, we’re just setting an environment variable, so others build backends will just ignore it.

And so, we’ve came full circle. We’ve enabled configuration files early on, switched to other hacks when PEP517 builds broke that and eventually returned to unconditionally using configuration files.

Leave a Reply

Your email address will not be published.