Week 1 – Modernization of Portage
Coding period starts
So, it’s the first week of the official coding period and I wanted to write some code and get it merged into the master branch (I understand it’s a bit over ambitious of me, but a man can wish). As I said in the first blog post, portage is relied up on by many people for different use cases and if something were a simple fix, the gentoo developers would have done it already. I just can’t storm in and make changes, expect things to work.
So, we tried to find a place which has very little impact on the portage’s running and ended up at emerge --version
.
The problem.
emerge’s –version command takes a bit longer than most programs. For example,
$ time gcc --version
Executed in 1.07 millis
$ time python --version
Executed in 2.70 millis
$ time black --version
Executed in 84.64 millis
Guess how much time emerge --version
takes.
$ time emerge --version
Executed in 709.19 millis
There is something sinister going on. So we decided to profile the the code. The live profile results can be found here.
Here is the image version
Diving into the profile results
Here, the total run takes 750 milliseconds.
- Till
emerge_main
, it takes 71 milliseconds – That’s for some imports, command line arguments parsing etc, cannot be avoided. - From
emerge_main
torun_action
takes 167 milliseconds. This is mainly due to the creation ofemerge_config
object, which is needed to assess different variables (emerge –version outputs more information than only a version number). This can be reduced with significant code changes, but it will lead to a lot of code restructuring and code duplication. We still got 500 milliseconds to account for, lets look into that. - Notice that there is no difference between
run_action
andgetportageversion
. This means that getting the portage version takes around 1 millisecond. It is true because, it
PORTAGE_VERSION
is just a variable defined inlib/portage/__init__.py
. getgccversion()
takes 460 milliseconds. That’s concerning because we just noticed thatgcc --version
takes around 1 millisecond.
Diving into getgccversion()
Turns out getgccversion
gets gcc’s version in two ways. gcc --dumpversion
or gcc-config -c
. If the former code path is taken, getgccversion
takes a couple milliseconds, but if the latter code path is taken, getgccversion
takes 450 milliseconds. I tried to find when the former codepath is taken, but it is almost never and it seemed like gcc-config -c
is unnecessary. So, I avoided the call and created a new code path just for --version
(so that no other part of portage is affected).
Patch
Together with a new code path for --version
, with the help of my mentor Sam James, we also refactored a few big functions into smaller ones, and added a few quality of life changes like f-strings etc. The pull request can be found here With this change, the emerge --version
goes from 750ms to 240ms.
But, we did not consider the edge cases where the CHOST of the system where packages are compiled might be different than the system the binpkgs would be used. Though it does not affect the functionality of portage in any way, it could provide wrong information to the
end user when he/she types emerge --version
. So, the pull request is not merged yet and we are working on solutions to the problem. One of the main reasons for delay of resolution is the fact
that I don’t completely understand what exactly gcc-config
does. It is a bash script and I have very little knowledge of bash. We are working on a solution and will try to get the changes merged into master.
Side benefit
Studying the portage codebase for emerge –version has been indeed a fruitful one. I am getting more familiar with the codebase and I was able to find an unreachable code block (duplicated logic). I submitted a pull request and it was merged. Sam commented
that I am getting familiar with the codebase. That felt good.
Next week
I need more understanding of portage’s internals. Sam suggested that I add docstrings and type annotations to the codebase. That’ll help new developers as well as help me understand the codebase more. So, the next week will most probably be spent type annotating and adding docstrings. I’ll also spent a bit of time learning bash so that I can work on gcc-config and many more as portage/gentoo relies a lot on bash.
Conclusion
Overall, the first week was a productive one. Though the pull request is not merged yet, it has good changes with respect to refactoring. If the new codepath is not sucessful, we’d drop those commits to merge in the rest, hopefully we fix the patch to work on all CHOSTs. See you next week! Have a good one!