Week 1 – Modernization of Portage
Coding period starts
So, it’s the first week of the official coding period and I wanted to write some code and get it merged into the master branch (I understand it’s a bit over ambitious of me, but a man can wish). As I said in the first blog post, portage is relied up on by many people for different use cases and if something were a simple fix, the gentoo developers would have done it already. I just can’t storm in and make changes, expect things to work.
So, we tried to find a place which has very little impact on the portage’s running and ended up at
emerge’s –version command takes a bit longer than most programs. For example,
$ time gcc --version Executed in 1.07 millis $ time python --version Executed in 2.70 millis $ time black --version Executed in 84.64 millis
Guess how much time
emerge --version takes.
$ time emerge --version Executed in 709.19 millis
There is something sinister going on. So we decided to profile the the code. The live profile results can be found here.
Here is the image version
Diving into the profile results
Here, the total run takes 750 milliseconds.
emerge_main, it takes 71 milliseconds – That’s for some imports, command line arguments parsing etc, cannot be avoided.
run_actiontakes 167 milliseconds. This is mainly due to the creation of
emerge_configobject, which is needed to assess different variables (emerge –version outputs more information than only a version number). This can be reduced with significant code changes, but it will lead to a lot of code restructuring and code duplication. We still got 500 milliseconds to account for, lets look into that.
- Notice that there is no difference between
getportageversion. This means that getting the portage version takes around 1 millisecond. It is true because, it
PORTAGE_VERSIONis just a variable defined in
getgccversion()takes 460 milliseconds. That’s concerning because we just noticed that
gcc --versiontakes around 1 millisecond.
getgccversion gets gcc’s version in two ways.
gcc --dumpversion or
gcc-config -c. If the former code path is taken,
getgccversion takes a couple milliseconds, but if the latter code path is taken,
getgccversion takes 450 milliseconds. I tried to find when the former codepath is taken, but it is almost never and it seemed like
gcc-config -c is unnecessary. So, I avoided the call and created a new code path just for
--version (so that no other part of portage is affected).
Together with a new code path for
--version, with the help of my mentor Sam James, we also refactored a few big functions into smaller ones, and added a few quality of life changes like f-strings etc. The pull request can be found here With this change, the
emerge --version goes from 750ms to 240ms.
But, we did not consider the edge cases where the CHOST of the system where packages are compiled might be different than the system the binpkgs would be used. Though it does not affect the functionality of portage in any way, it could provide wrong information to the
end user when he/she types
emerge --version. So, the pull request is not merged yet and we are working on solutions to the problem. One of the main reasons for delay of resolution is the fact
that I don’t completely understand what exactly
gcc-config does. It is a bash script and I have very little knowledge of bash. We are working on a solution and will try to get the changes merged into master.
Studying the portage codebase for emerge –version has been indeed a fruitful one. I am getting more familiar with the codebase and I was able to find an unreachable code block (duplicated logic). I submitted a pull request and it was merged. Sam commented
that I am getting familiar with the codebase. That felt good.
I need more understanding of portage’s internals. Sam suggested that I add docstrings and type annotations to the codebase. That’ll help new developers as well as help me understand the codebase more. So, the next week will most probably be spent type annotating and adding docstrings. I’ll also spent a bit of time learning bash so that I can work on gcc-config and many more as portage/gentoo relies a lot on bash.
Overall, the first week was a productive one. Though the pull request is not merged yet, it has good changes with respect to refactoring. If the new codepath is not sucessful, we’d drop those commits to merge in the rest, hopefully we fix the patch to work on all CHOSTs. See you next week! Have a good one!