Speeding up emerge depgraph calculation using PyPy3

WARNING: Some of the respondents were< not able to reproduce my results. It is possible that this dependent on the hardware or even a specific emerge state. Please do not rely on my claims that PyPy3 runs faster, and verify it on your system before switching permanently.

If you used Gentoo for some time, you’ve probably noticed that emerge is getting slower and slower. Before I switched to SSD, my emerge could take even 10 minutes before it figured out what to do! Even now it’s pretty normal for the dependency calculation to take 2 minutes. Georgy Yakovlev recently tested PyPy3 on PPC64, and noticed a great speedup, apparently due to very poor optimization of CPython on that platform. I’ve attempted the same on amd64, and measured a 35% speedup nevertheless.

PyPy is an alternative implementation of Python that uses a JIT compiler to run Python code. JIT can achieve greater performance on computation-intensive tasks, at the cost of slower program startup. This means that it could be slower for some programs, and faster for others. In case of emerge dependency calculation, it’s definitely faster. A quick benchmark done using dev-perl/Dumbbench (great tool, by the way) shows, for today’s @world upgrade:

  • Python 3.9.0: 111.42 s ± 0.87 s (0.8%)
  • PyPy3.7 7.3.2: 72.30 s ± 0.23 s (0.3%)

dev-python/pypy3 is supported on Gentoo, on amd64, arm64, ppc64 and x86 targets. The interpreter itself takes quite a while to build (35­­–45 minutes on a modern Ryzen), so you may want to suggest emerge to grab dev-python/pypy3-exe-bin:

$ emerge -nv dev-python/pypy3 dev-python/pypy3-exe-bin

If you want to build it from source, it is recommended to grab dev-python/pypy first (possibly with dev-python/pypy-exe-bin for faster bootstrap), as building with PyPy itself is much faster:

# use prebuilt compiler for fast bootstrap
$ emerge -1v dev-python/pypy dev-python/pypy-exe-bin
# rebuild the interpreter
$ emerge -nv dev-python/pypy dev-python/pypy-exe
# build pypy3
$ emerge -nv dev-python/pypy3

Update 2020-10-07: Afterwards, you need to rebuild Portage and its dependencies with PyPy3 support enabled. The easiest way of doing it is to enable the PyPy3 target globally, and rebuilding relevant packages:

$ echo '*/* PYTHON_TARGETS: pypy3' >> /etc/portage/package.use
$ emerge -1vUD sys-apps/portage

Finally, you can use python-exec’s per-program configuration to use PyPy3 for emerge while continuing to use CPython for other programs:

$ echo pypy3 >> /etc/python-exec/emerge.conf
# yep, that's pypy3.7
$ emerge --info | head -1
Portage 3.0.7 (python 3.7.4-final-0, default/linux/amd64/17.1/desktop, gcc-9.3.0, glibc-2.32-r2, 5.8.12 x86_64)

17 thoughts on “Speeding up emerge depgraph calculation using PyPy3”

  1. > Even now it’s pretty normal for the dependency calculation to take 2 minutes.

    Heh, I have tens of minutes for the world upgrade sometimes.

  2. Mh doing this in a stable system I still get python 3.7.8-final-0 from emerge –info, am I missing something?

    1. Actually, I’ve missed one important thing — you need to rebuild Portage with PYTHON_TARGETS: pypy3 (which also means some deps).

      1. Looks like on a default amd64 profile python_targets_pypy3 needs to be unmasked as well.

        # tail -n 4 /var/db/repos/gentoo/profiles/arch/amd64/use.stable.mask
        # Michał Górny (2014-03-30)
        # PyPy is unstable on this arch.
        python_targets_pypy3
        python_single_target_pypy3

        1. > Looks like on a default amd64 profile python_targets_pypy3 needs to be unmasked as well.

          …and how do you go about doing that? I have -python_targets_pypy3 and -python_single_target_pypy3 in /etc/portage/use.mask, and it’s ignored. I’ve also tried it as /etc/portage/use.stable.mask, tried with the filename ending in “unmask” instead of “mask,” and removed the “-“s in the files. Nothing works; emerge -1vUD portage rebuilds nothing.

        2. Never mind…a bit more digging revealed that use.mask needs to be stored under /etc/portage/profile, not /etc/portage. It should contain the following:

          -python_targets_pypy3
          -python_single_target_pypy3

  3. pypy3 takes more than twice as much for me:

    # emerge –info|head -1
    Portage 3.0.8 (python 3.7.9-final-0, default/linux/amd64/17.0/desktop/plasma, gcc-9.3.0, glibc-2.31-r6, 5.4.65 x86_64)
    # time emerge -puDU world
    […]
    real 3m11.827s
    user 3m10.699s
    sys 0m1.002s

    # echo pypy3 >> /etc/python-exec/emerge.conf
    # emerge –info|head -1
    Portage 3.0.8 (python 3.6.9-final-0, default/linux/amd64/17.0/desktop/plasma, gcc-9.3.0, glibc-2.31-r6, 5.4.65 x86_64)
    # time emerge -puDU world
    […]
    real 7m52.081s
    user 7m50.541s
    sys 0m1.395s

    # equery l python
    * Searching for python …
    [IP-] [ ] dev-lang/python-2.7.18-r3:2.7
    [IP-] [ ] dev-lang/python-3.6.12:3.6/3.6m
    [IP-] [ ] dev-lang/python-3.7.9:3.7/3.7m
    # equery l pypy3
    * Searching for pypy3 …
    [IP-] [ ] dev-python/pypy3-7.3.2:0/pypy36-pp73

    did I miss something obvious?

    1. That’s curious. Maybe try with dumbbench (or at least get a few consecutive runs) to eliminate cold cache.

  4. No speed up on my system.
    Before (python 3.7):
    # emerge -puDU @world
    real 2m52,580s
    user 2m36,446s
    sys 0m3,182s

    After (pypy3-7.3.1-r3):
    First run
    # emerge -puDU @world
    real 3m10,972s
    user 2m36,699s
    sys 0m3,439s

    Second run
    # emerge -puDU @world
    real 3m43,964s
    user 2m43,164s
    sys 0m3,504s

    # emerge –info | head -1
    Portage 3.0.8 (python 3.7.9-final-0, default/linux/amd64/17.1/desktop, gcc-9.3.0, glibc-2.32-r2, 5.4.72-gentoo-nouveau x86_64)

    $ cat /proc/cpuinfo | head -n12
    processor : 0
    vendor_id : AuthenticAMD
    cpu family : 15
    model : 47
    model name : AMD Athlon(tm) 64 Processor 3200+
    stepping : 2
    cpu MHz : 2159.960
    cache size : 512 KB
    physical id : 0
    siblings : 1
    core id : 0
    cpu cores : 1

    $ free -h | column -t
    total used free shared buff/cache available
    Mem: 1,9Gi 964Mi 128Mi 50Mi 830Mi 869Mi
    Swap: 3,7Gi 306Mi 3,4Gi

    Unfortunately I can’t build pypy3 from sources, because it needed 6G RAM at least, so I use precompiled binaries.

  5. Unscientifically, this does seem to have provided a speedup for me, but I didn’t get a time beforehand. Enabling testing for all packages and backtracking resolves in quite a reasonable timeframe – it certainly doesn’t appear to be any worse than it was on regular old Python 3.8.

    I did compile pypy3 from source, not sure if that makes a difference overall though.

  6. Here’s my results! Granted my system doesn’t have a lot of packages atm. Pypy was slower by 2 seconds, I’m assuming that’s the startup overhead as mentioned. I’ll just yolo with pypy for now, not like the time will matter once I start pgo and LTO-izing the system.

    pypy3
    cmd: Ran 21 iterations (1 outliers).
    cmd: Rounded run time per iteration: 2.4650e+01 +/- 7.4e-02 (0.3%)

    python3.9
    cmd: Ran 21 iterations (1 outliers).
    cmd: Rounded run time per iteration: 2.2582e+01 +/- 5.9e-02 (0.3%)

  7. python 3.9.12 (lto pgo):
    emerge -p firefox 3,660 total
    emerge -puDN world -q 25,043 total
    emerge -pe world -q 6,935 total

    pypy3 7.3.9:
    emerge -p firefox 7,338 total
    emerge -puDN world -q 25,504 total
    emerge -pe world -q 11,840 total

  8. Results on amd64 musl libc (compiled with Clang -flto) for first run:
    Pypy3 (python 3.10.13-final-0):
    real 3m4.508s
    user 2m59.969s
    sys 0m3.383s
    CPython 3.12 (python 3.12.0-final-0):
    real 4m10.098s
    user 0m0.022s
    sys 0m0.052s

Leave a Reply

Your email address will not be published.