Michał Górny

Optimizing parallel extension builds in PEP517 builds

The distutils (and therefore setuptools) build system supports building C extensions in parallel, through the use of -j (--parallel) option, passed either to build_ext or build command. Gentoo distutils-r1.eclass has always passed these options to speed up builds of packages that feature multiple C files.

However, the switch to PEP517 build backend made this problematic. While the backend uses the respective commands internally, it doesn’t provide a way to pass options to them. In this post, I’d like to explore the different ways we attempted to resolve this problem, trying to find an optimal solution that would let us benefit from parallel extension builds while preserving minimal overhead for packages that wouldn’t benefit from it (e.g. pure Python packages). I will also include a fresh benchmark results to compare these methods.
Continue reading “Optimizing parallel extension builds in PEP517 builds”

The story of distutils build directory in Gentoo

The Python distutils build system, as well as setuptools (that it was later merged into), used a two-stage build: first, a build command would prepare a built package version (usually just copy the .py files, sometimes compile Python extensions) into a build directory, then an install command would copy them to the live filesystem, or a staging directory. Curious enough, distutils were an early adopter of out-of-source builds — when used right (which often enough wasn’t the case), no writes would occur in the source directory and all modifications would be done directly in the build directory.

Today, in the PEP517 era, two-stage builds aren’t really relevant anymore. Build systems were turned into black boxes that spew wheels. However, setuptools still internally uses the two-stage build and the build directory, and therefore it still remains relevant to Gentoo eclasses. In this post, I’d like to shortly tell how we dealt with it over the years.
Continue reading “The story of distutils build directory in Gentoo”

A format that does one thing well or one-size-fits-all?

The Unix philosophy states that we ought to design programs that “do one thing well”. Nevertheless, the current trend is to design huge monoliths with multiple unrelated functions, with web browsers at the peak of that horrifying journey. However, let’s consider something else.

Does the same philosophy hold for algorithms and file formats? Is it better to design formats that suit a single use case well, and swap between different formats as need arises? Or perhaps it is a better solution to actually design them so they could fit different needs?

Let’s consider this by exploring three areas: hash algorithms, compressed file formats and image formats.
Continue reading “A format that does one thing well or one-size-fits-all?”

A script to set file timestamps based on filenames

I’ve noticed that SyncThing didn’t seem to preserve modification timestamps for the photos and videos transferred from my phone. Since this messed up sorting by date, and the timestamps are present in filenames, I’ve made a trivial script to set mtimes appropriately.
Continue reading “A script to set file timestamps based on filenames”

My thin wrapper for emerge(1)

I’ve recently written a thin wrapper over emerge that I use in my development environment. It does the following:

set tmux pane title to the first package argument (so you can roughly see what’s emerging on every pane)
beep meaningfully when emerge finishes (two beeps for success, three for failure),
run pip check after successful run to check for mismatched Python dependencies.

Continue reading “My thin wrapper for emerge(1)”