{"id":1918,"date":"2024-03-13T11:18:29","date_gmt":"2024-03-13T10:18:29","guid":{"rendered":"https:\/\/blogs.gentoo.org\/mgorny\/?p=1918"},"modified":"2024-03-13T11:18:29","modified_gmt":"2024-03-13T10:18:29","slug":"the-story-of-distutils-build-directory-in-gentoo","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/mgorny\/2024\/03\/13\/the-story-of-distutils-build-directory-in-gentoo\/","title":{"rendered":"The story of distutils build directory in Gentoo"},"content":{"rendered":"<p>The Python distutils build system, as well as setuptools (that it was later merged into), used a two-stage build: first, a <kbd>build<\/kbd> command would prepare a built package version (usually just copy the <kbd>.py<\/kbd> files, sometimes compile Python extensions) into a build directory, then an <kbd>install<\/kbd> command would copy them to the live filesystem, or a staging directory.  Curious enough, distutils were an early adopter of out-of-source builds \u2014 when used right (which often enough wasn&#8217;t the case), no writes would occur in the source directory and all modifications would be done directly in the build directory.<\/p>\n<p>Today, in the PEP517 era, two-stage builds aren&#8217;t really relevant anymore.  Build systems were turned into black boxes that spew wheels.  However, setuptools still internally uses the two-stage build and the build directory, and therefore it still remains relevant to Gentoo eclasses.  In this post, I&#8217;d like to shortly tell how we dealt with it over the years.<br \/>\n<!--more--><\/p>\n<h2>Act 1: The first overrides<\/h2>\n<p>Normally, distutils would use a build directory of <kbd>build\/lib*<\/kbd>, optionally suffixed for platform and Python version.  This was reasonably good most of the time, but not good enough for us.  On one hand, it didn&#8217;t properly distinguish CPython and PyPy (and it wouldn&#8217;t for a long time, until <a rel=\"external\" href=\"https:\/\/github.com\/pypa\/distutils\/pull\/133\">Use cache_tag in default build_platlib dir<\/a> PR).  On the other, the directory name would be hard to get, if ebuilds ever needed to do something about it (and we surely did).<\/p>\n<p>Therefore, the eclass would start overriding build directories quite early on.  We would start by passing <kbd>--build-base<\/kbd> to the <kbd>build<\/kbd> command, then add <kbd>--build-lib<\/kbd> to make the <kbd>lib<\/kbd> subdirectory path simpler, then replace it with separate <kbd>--build-platlib<\/kbd> and <kbd>--build-purelib<\/kbd> to workaround build systems overriding one of them (wxPython, if I recall correctly).<\/p>\n<p>The eclass would class this mode &#8220;out-of-source build&#8221; and use a dedicated <var>BUILD_DIR<\/var> variable to refer to the dedicated build directory.  Confusingly, &#8220;in-source build&#8221; would actually indicate a distutils-style out-of-source build in the default <kbd>build<\/kbd> subdirectory, and the eclass would create a separate copy of the sources for every Python target (effectively permitting in-source modifications).<\/p>\n<p><a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo\/historical.git\/tree\/eclass\/distutils-r1.eclass?id=e061eb09af0558661de8a3078655e25d06d905ee#n226\">The last version of code passing <kbd>--build*<\/kbd> options.<\/a><\/p>\n<h2>Act 2: .pydistutils.cfg<\/h2>\n<p>The big problem with the earlier approach is that you&#8217;d have to pass the options every time <kbd>setup.py<\/kbd> is invoked.  Given the design of option passing in distutils, this effectively meant that you needed to repeatedly invoke the <kbd>build<\/kbd> commands (otherwise you couldn&#8217;t pass options to it).<\/p>\n<p>The next step would be to replace this logic by using <kbd>.pydistutils.cfg<\/kbd> configuration file.  The file, placed in <var>HOME<\/var> (also overridden in eclass) would allow us to set option values without actually having to pass specific commands on the command-line.  The relevant logic, added in September 2013 (commit: <a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo\/historical.git\/commit\/eclass\/distutils-r1.eclass?id=806966ab94abb49cc9a40c240a6aec03b0a995b4\">Use pydistutils.cfg to set build-dirs instead of passing commands explicitly\u2026<\/a>), remains in the eclass even today.  However, since the PEP517 build mode stopped using this file, it is used only in legacy mode.<\/p>\n<p><a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo.git\/tree\/eclass\/distutils-r1.eclass?id=4c42c36f5fdea8612cb824fbb6eeeeaed0719531#n668\">The latest version of the code writing <kbd>.pydistutils.cfg<\/kbd>.<\/a><\/p>\n<h2>Act 3: Messy PEP517 mode<\/h2>\n<p>One of the changes caused by building in PEP517 mode was that <kbd>.pydistutils.cfg<\/kbd> started being ignored.  This implied that setuptools were using the default <kbd>build<\/kbd> directory again.  It wasn&#8217;t such a big deal anymore \u2014 since we no longer used proper separation between the two build stages, and we no longer needed to have any awareness of the intermediate build directory, the path didn&#8217;t matter per se.  However, it meant CPython and PyPy started sharing the same build directory again \u2014 and since setuptools install stage picks everything up from that directory, it meant that extensions built for PyPy3.10 would be installed to CPython3.10 directory!<\/p>\n<p>How did we deal with that?  Well, at first I&#8217;ve tried <a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo.git\/commit\/eclass\/distutils-r1.eclass?id=44ea4a8c091afbee0b85443670eba504fb0e131e\">calling <kbd>setup.py clean -a<\/kbd><\/a>.  It was kinda ugly, especially that it meant combining <kbd>setup.py<\/kbd> calls with PEP517 invocations \u2014 but then, we were already calling <kbd>setup.py build<\/kbd> to take advantage of parallel build jobs when building extensions, and it worked.  For a time.<\/p>\n<p>Unfortunately, it turned out that some packages override the <kbd>clean<\/kbd> command and break our code, or even literally block calling it.  So the next step was to stop being fancy and literally <a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo.git\/commit\/eclass\/distutils-r1.eclass?id=50cf28d11e3908467f6c10030ce66f55eea1c23a\">call rm -rf build<\/a>.  Well, this was ugly, but \u2014 again \u2014 it worked.<\/p>\n<h2>Act 4: Back to the config files<\/h2>\n<p>As I&#8217;ve mentioned before, we continued to call the <kbd>build<\/kbd> command in PEP517 mode, in order to enable building C extensions in parallel via the <kbd>-j<\/kbd> option.  Over time, this code grew in complexity \u2014 we&#8217;ve replaced the call with more specific <kbd>build_ext<\/kbd>, then started adding heuristics to avoid calling it when unnecessary (a no-op <kbd>setup.py build_ext<\/kbd> call slowed pure Python package builds substantially).<\/p>\n<p>Eventually, Eli Schwartz came up with a great alternative \u2014 <a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo.git\/commit\/eclass\/distutils-r1.eclass?id=597762f0cd480abcd18792f519db370a6c249e25\">using <var>DIST_EXTRA_CONFIG<\/var> to provide a configuration file<\/a>.  This meant that we could replace both <kbd>setup.py<\/kbd> invocations \u2014 by using the configuration file both to specify the job count for extension builds, and to use a dedicated build directory.<\/p>\n<p>The change originally was done only for the explicit use of <kbd>setuptools<\/kbd> build backend.  As a result, we&#8217;ve missed a bunch of &#8220;indirect&#8221; setuptools uses \u2014 other setuptools-backed PEP517 backends (jupyter-builder, pbr), backends using setuptools conditionally (pdm-backend), custom wrappers over setuptools and\u2026 <kbd>dev-python\/setuptools<\/kbd> package itself (&#8220;standalone&#8221; backend).  We&#8217;ve learned about it the hard way when setuptools stopped implicitly ignoring the <kbd>build<\/kbd> directory as a package name \u2014 and effectively a subsequent build collected a copy of the previous build as a <kbd>build<\/kbd> package.  Yep, we&#8217;ve ended up with a monster of <kbd>\/usr\/lib\/python3.12\/site-packages\/build\/lib\/build\/lib\/setuptools<\/kbd>.<\/p>\n<p>So we approach the most recent change: <a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/repo\/gentoo.git\/commit\/eclass\/distutils-r1.eclass?id=920edc504064fa38caa462b4d378114599f65925\">enabling the config for all backends<\/a>.  After all, we&#8217;re just setting an environment variable, so others build backends will just ignore it.<\/p>\n<p>And so, we&#8217;ve came full circle.  We&#8217;ve enabled configuration files early on, switched to other hacks when PEP517 builds broke that and eventually returned to unconditionally using configuration files.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Python distutils build system, as well as setuptools (that it was later merged into), used a two-stage build: first, a build command would prepare a built package version (usually just copy the .py files, sometimes compile Python extensions) into a build directory, then an install command would copy them to the live filesystem, or &hellip; <a href=\"https:\/\/blogs.gentoo.org\/mgorny\/2024\/03\/13\/the-story-of-distutils-build-directory-in-gentoo\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The story of distutils build directory in Gentoo&#8221;<\/span><\/a><\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[15],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/1918"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/comments?post=1918"}],"version-history":[{"count":26,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/1918\/revisions"}],"predecessor-version":[{"id":1944,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/1918\/revisions\/1944"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/media?parent=1918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/categories?post=1918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/tags?post=1918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}