Optimizing distutils-r1.eclass via wheel reuse

Yesterday I’ve enabled a new distutils-r1.eclass optimization: wheel reuse. Without this optimization, the eclass would build a separate wheel for every Python implementation enabled, and then install every one of these wheels. In many cases, this meant repeatedly building the same thing. With the optimization enabled, under some circumstances the eclass will be able to build one (or two) wheels, and install them for all implementations.

This change brings the eclass behavior closer to the behavior of package managers such as pip. While this will cause no change for users who build packages for a single Python version only, it can bring some nice speedup when building for multiple interpreters. Particularly, pure Python packages using setuptools will no longer incur the penalty of having to start setuptools multiple times (which is quite slow), and packages using the stable ABI won’t have to build roughly identical extensions multiple times.

In this post, I’m going to shortly go over a few design considerations of the new feature.

Pure Python wheels, and partial C extension compatibility

The obvious candidate for wheel reuse are pure Python wheels, i.e. packages using the *-py3-none-any.whl (or *-py2.py3-none-any.whl) suffix. Therefore, the algorithm would be roughly this: build a wheel; if you get a pure Python wheel, use it for all implementations.

[Well, to be more precise, the eclass works more like this: check if any of the previously built wheels can be used; if one can, use it; otherwise build a new wheel, add it to the list and use that.]

However, there is a problem with that approach: some packages feature extensions that aren’t used across all supported implementations. In particular, some packages don’t enable extensions for PyPy (often simply because pure Python code with JIT tends to be faster than calling into the C/Rust extension). Since we’re building for PyPy3 first, the pure Python package created for PyPy would end up being reused across all implementations!

Fortunately, a simple way around the problem was already available — for multiple reasons, we already expect DISTUTILS_EXT to be set for all ebuilds featuring (at least optional) compiled extensions. Therefore, I’ve modified the logic to reuse pure Python wheels only if we don’t expect extensions. If we do, then pure Python wheels are ignored.

Of course, this is not a perfect solution. If a package supports more than one implementation that uses pure Python version, the wheel won’t be reused. In fact, if a package features native-extensions flag and it’s disabled, so no extensions are built at all, the pure Python wheel reuse is also disabled! But that’s just a matter of missed optimization, and it’s better to stay on the safe side here.

Still, there are some risks left here. In particular, if a developer misses the CPython-only extension and includes PyPy3 from day one, wheel reuse will prevent the eclass from immediately reporting missing DISTUTILS_EXT. Fortunately, I think we can reasonably expect that someone will build it with PyPy3 target disabled and report the problem. In fact, I’m pretty sure our CI will catch that very fast.

Stable ABI wheels

The second candidate for wheel reuse are stable ABI wheels. Long story short, normally Python extensions are only guaranteed to be compatible with the single version of Python they were built for. However, should one use the so-called limited API, the resulting extensions will be forward-compatible with all CPython versions newer than the specified minimal version. The advantage from reusing stable ABI wheels is much greater than from pure Python wheels — since we can avoid repeatedly building the same C or Rust code, that can be quite resource consuming.

Normally, reusing stable ABI wheels requires determining whether a particular ABI/platform tag is compatible with the implementation in question. For example, a stable ABI wheel could be suffixed *-cp38-abi3-linux_x86_64.whl. This means that the particular wheel is compatible with CPython 3.8 and newer, on Linux x86_64 platform. Unfortunately, these tags can get quite complex and packaging features quite extensive code for determining tag compatibility.

Good news is that we don’t really need to do that. Since we’re building wheels locally, we don’t need to be concerned about the platform tag at all. Furthermore, since we are building from oldest to newest Python version, we can also ignore the ABI tag (beyond checking for abi3) and assume that the wheel built for previous (i.e. earlier) CPython version will be compatible with the newer version. That said, we need to take special consideration that the stable ABI is supported only by CPython and not PyPy.

Multiple wheels per package

One final problem with wheel reuse is that a single Gentoo package may be building multiple wheels. For example, dev-python/sqlglot builds a main Python package and a Rust extension. A “dumb” wheel reuse would mean that the first wheel built would be used for all subsequent calls, even if these were supposed to build completely different packages!

To resolve this issue, I’ve converted the DISTUTILS_WHEELS variable into an associative array, mapping wheels into directory paths. For every wheel built, we are recording both wheel path and the source directory — and reusing the wheel only if the directory matches.

Summary

The resulting code in distutils-r1.eclass implements all that was mentioned above. I have been using it for 2 months prior to enabling it by default, and found no issues. During this period, the eclass was additionally verifying that Python packages don’t install files with different contents, when they declare to produce universal wheels.

I’m really proud of how simple the logic is. If wheel reuse is enabled, scan recorded wheel list for wheels matching the current directory. For all matching wheels, check their tags. If we do not expect extensions, and we’ve got a pure Python wheel, use it. If we are installing for CPython, and we’ve got a stable ABI wheel, use it. Otherwise (no matching wheel or reuse disabled), build and install a new wheel (this is actually a call to the old function) and add it to the list.

Hope this helps you save some time and save some energy. I definitely don’t need the extra heating in this hell of a summer.

Leave a Reply

Your email address will not be published.