Optimizing parallel extension builds in PEP517 builds

The distutils (and therefore setuptools) build system supports building C extensions in parallel, through the use of -j (--parallel) option, passed either to build_ext or build command. Gentoo distutils-r1.eclass has always passed these options to speed up builds of packages that feature multiple C files.

However, the switch to PEP517 build backend made this problematic. While the backend uses the respective commands internally, it doesn’t provide a way to pass options to them. In this post, I’d like to explore the different ways we attempted to resolve this problem, trying to find an optimal solution that would let us benefit from parallel extension builds while preserving minimal overhead for packages that wouldn’t benefit from it (e.g. pure Python packages). I will also include a fresh benchmark results to compare these methods.

The history

The legacy build mode utilized two ebuild phases: the compile phase during which the build command was invoked, and the install phase during which install command was invoked. An explicit command invocation made it possible to simply pass the -j option.

When we initially implemented the PEP517 mode, we simply continued calling esetup.py build, prior to calling the PEP517 backend. The former call built all the extensions in parallel, and the latter simply reused the existing build directory.

This was a bit ugly, but it worked most of the time. However, it suffered from a significant overhead from calling the build command. This meant significantly slower builds in the vast majority of packages that did not feature multiple C source files that could benefit from parallel builds.

The next optimization was to replace the build command invocation with more specific build_ext. While the former also involved copying all .py files to the build directory, the latter only built C extensions — and therefore could be pretty much a no-op if there were none. As a side effect, we’ve started hitting rare bugs when custom setup.py scripts assumed that build_ext is never called directly. For a relatively recent example, there is my pull request to fix build_ext -j… in pyzmq.

I’ve followed this immediately with another optimization: skipping the call if there were no source files. To be honest, the code started looking messy at this point, but it was an optimization nevertheless. For the no-extension case, the overhead of calling esetup.py build_ext was replaced by the overhead of calling find to scan the source tree. Of course, this still had some risk of false positives and false negatives.

The next optimization was to call build_ext only if there were at least two files to compile. This mostly addressed the added overhead for packages building only one C file — but of course it couldn’t resolve all false positives.

One more optimization was to make the call conditional to DISTUTILS_EXT variable. While the variable was introduced for another reason (to control adding debug flag), it provided a nice solution to avoid both most of the false positives (even if they were extremely rare) and the overhead of calling find.

The last step wasn’t mine. It was Eli Schwartz’s patch to pass build options via DIST_EXTRA_CONFIG. This provided the ultimate optimization — instead of trying to hack a build_ext call around, we were finally able to pass the necessary options to the PEP517 backend. Needless to say, it meant not only no false positives and no false negatives, but it effectively almost eliminated the overhead in all cases (except for the cost of writing the configuration file).

The timings

Table 1. Benchmark results for various modes of package builds
Django 5.0.3 Cython 3.0.9
Serial PEP517 build 5.4 s 46.7 s
build total 3.1 s 8.4 s 20.8 s 23.5 s
PEP517 5.3 s 2.7 s
build_ext total 0.6 s 6 s 20.8 s 23.5 s
PEP517 5.4 s 2.7 s
find + build_ext total 0.06 s 5.5 s 20.9 s 23.6 s
PEP517 5.4 s 2.7 s
Parallel PEP517 build 5.4 s 22.8 s
Plot of the results from the table
Figure 1. Benchmark times plot

For a pure Python package (django here), the table clearly shows how successive iterations have reduced the overhead from parallel build supports, from roughly 3 seconds in the earliest approach, resulting in 8.4 s total build time, to the same 5.4 s as the regular PEP517 build.

For Cython, all but the ultimate solution result in roughly 23.5 s total, half of the time needed for a serial build (46.7 s). The ultimate solution saves another 0.8 s on the double invocation overhead, giving the final result of 22.8 s.

Test data and methodology

The methods were tested against two packages:

  1. Django 5.0.3, representing a moderate size pure Python package, and
  2. Cython 3.0.9, representing a package with a moderate number of C extensions.

Python 3.12.2_p1 was used for testing. The timings were done using time command from bash. The results were averaged from 5 warm cache test runs. Testing was done on AMD Ryzen 5 3600, with pstates boost disabled.

The PEP517 builds were performed using the following command:

python3.12 -m build -nwx

The remaining commands and conditions were copied from the eclass. The test scripts, along with the results, spreadsheet and plot source can be found in the distutils-build-bench repository.

Leave a Reply

Your email address will not be published.