Week 1 – Modernization of Portage

Week 1 – Modernization of Portage

Coding period starts

So, it’s the first week of the official coding period and I wanted to write some code and get it  merged into the master branch (I understand it’s a bit over ambitious of me, but a man can wish).  As I said in the first blog post, portage is relied up on by many people for different use cases and if  something were a simple fix, the gentoo developers would have done it already. I just can’t storm in and make changes, expect things to work.

So, we tried to find a place which has very little impact on the portage’s running and ended up at emerge --version.

The problem.

emerge’s –version command takes a bit longer than most programs. For example,

$ time gcc --version
Executed in 1.07 millis

$ time python --version
Executed in 2.70 millis

$ time black --version
Executed in 84.64 millis

Guess how much time emerge --version takes.

$ time emerge --version
Executed in 709.19 millis

There is something sinister going on. So we decided to profile the the code. The live profile results  can be found here.

Here is the image version profile

Diving into the profile results

Here, the total run takes 750 milliseconds.

  • Till emerge_main, it takes 71 milliseconds – That’s for some imports, command line arguments parsing etc, cannot be avoided.
  • From emerge_main to run_action takes 167 milliseconds. This is mainly due to the creation of emerge_config object, which is needed to assess different variables (emerge –version outputs more information than only a version number). This can be reduced with significant code changes, but it will lead to a lot of code restructuring and code duplication. We still got 500 milliseconds to account for, lets look into that.
  • Notice that there is no difference between run_action and getportageversion. This means that getting the portage version takes around 1 millisecond. It is true because, it
    PORTAGE_VERSION is just a variable defined in lib/portage/__init__.py.
  • getgccversion() takes 460 milliseconds. That’s concerning because we just noticed that gcc --version takes around 1 millisecond.

Diving into getgccversion()

Turns out getgccversion gets gcc’s version in two ways. gcc --dumpversion or gcc-config -c. If the former code path is taken, getgccversion takes a couple milliseconds, but if the latter code path is taken, getgccversion takes 450 milliseconds. I tried to find when the former codepath is taken, but it is almost never and it seemed like gcc-config -c is unnecessary. So, I avoided the call and created a new code path just for --version (so that no other part of portage is affected).

Patch

Together with a new code path for --version, with the help of my mentor Sam James, we also refactored a few big functions into smaller ones, and added a few quality of life changes like f-strings etc. The pull request can be found here With this change, the emerge --version goes from 750ms to 240ms.

But, we did not consider the edge cases where the CHOST of the system where packages are  compiled might be different than the system the binpkgs would be used. Though it does not affect the functionality of portage in any way, it could provide wrong information to the
end user when he/she types emerge --version. So, the pull request is not merged yet and we are working on solutions to the problem. One of the main reasons for delay of resolution is the fact
that I don’t completely understand what exactly gcc-config does. It is a bash script and I have very little knowledge of bash. We are working on a solution and will try to get the changes merged into master.

Side benefit

Studying the portage codebase for emerge –version has been indeed a fruitful one. I am getting more familiar with the codebase and I was able to find an unreachable code block (duplicated logic). I submitted a pull request and it was merged. Sam commented 
that I am getting familiar with the codebase. That felt good.

Next week

I need more understanding of portage’s internals. Sam suggested that I add docstrings and type annotations to the codebase. That’ll help new developers as well as help me understand the codebase more. So, the next week will most probably be spent type annotating and adding docstrings. I’ll also spent a bit of time learning bash so that I can work on gcc-config and many more as portage/gentoo relies a lot on bash.

Conclusion

Overall, the first week was a productive one. Though the pull request is not merged yet, it has good changes with respect to refactoring. If the new codepath is not sucessful, we’d drop those commits to merge in the rest, hopefully we fix the patch to work on all CHOSTs. See you next week! Have a good one!

This entry was posted in 2023 GSoC, Modernization of Portage with C++ and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.