{"id":475,"date":"2023-06-18T11:36:29","date_gmt":"2023-06-18T11:36:29","guid":{"rendered":"https:\/\/blogs.gentoo.org\/gsoc\/?p=475"},"modified":"2023-06-18T11:36:29","modified_gmt":"2023-06-18T11:36:29","slug":"week-1-modernization-of-portage","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/week-1-modernization-of-portage\/","title":{"rendered":"Week 1 &#8211; Modernization of Portage"},"content":{"rendered":"<h1 id=\"week-1---modernization-of-portage\">Week 1 &#8211; Modernization of Portage<\/h1>\n<h2 id=\"coding-period-starts\">Coding period starts<\/h2>\n<p>So, it\u2019s the first week of the official coding period and I wanted to write some code and get it\u00a0 merged into the master branch (I understand it\u2019s a bit over ambitious of me, but a man can wish).\u00a0 As I said in the first blog post, portage is relied up on by many people for different use cases and if\u00a0 something were a simple fix, the gentoo developers would have done it already. I just can\u2019t storm in and make changes, expect things to work.<\/p>\n<p>So, we tried to find a place which has very little impact on the portage\u2019s running and ended up at <code>emerge --version<\/code>.<\/p>\n<h2 id=\"the-problem.\">The problem.<\/h2>\n<p>emerge\u2019s \u2013version command takes a bit longer than most programs. For example,<\/p>\n<pre><code>$ time gcc --version\r\nExecuted in 1.07 millis\r\n\r\n$ time python --version\r\nExecuted in 2.70 millis\r\n\r\n$ time black --version\r\nExecuted in 84.64 millis<\/code><\/pre>\n<p>Guess how much time <code>emerge --version<\/code> takes.<\/p>\n<pre><code>$ time emerge --version\r\nExecuted in 709.19 millis<\/code><\/pre>\n<p>There is something sinister going on. So we decided to profile the the code. The live profile results\u00a0 can be found <a href=\"https:\/\/profiling.berinaniesh.pp.ua\/snakeviz\/%2Fsrv%2Fprofiling%2Foutfile6\">here<\/a>.<\/p>\n<p>Here is the image version <img src=\"https:\/\/gist.githubusercontent.com\/berinaniesh\/4805eaaec0f7ffd41766d864f13c582f\/raw\/c6237cb0d29c5282e89533c1d3605577d45a332f\/profile.png\" alt=\"profile\" \/><\/p>\n<h2 id=\"diving-into-the-profile-results\">Diving into the profile results<\/h2>\n<p>Here, the total run takes 750 milliseconds.<\/p>\n<ul>\n<li>Till <code>emerge_main<\/code>, it takes 71 milliseconds &#8211; That\u2019s for some imports, command line arguments parsing etc, cannot be avoided.<\/li>\n<li>From <code>emerge_main<\/code> to <code>run_action<\/code> takes 167 milliseconds. This is mainly due to the creation of <code>emerge_config<\/code> object, which is needed to assess different variables (emerge \u2013version outputs more information than only a version number). This can be reduced with significant code changes, but it will lead to a lot of code restructuring and code duplication. We still got 500 milliseconds to account for, lets look into that.<\/li>\n<li>Notice that there is no difference between <code>run_action\u00a0<\/code>and <code>getportageversion<\/code>. This means that getting the portage version takes around 1 millisecond. It is true because, it<br \/>\n<code>PORTAGE_VERSION<\/code> is just a variable defined in <code>lib\/portage\/__init__.py<\/code>.<\/li>\n<li><code>getgccversion()<\/code> takes 460 milliseconds. That\u2019s concerning because we just noticed that <code>gcc --version<\/code> takes around 1 millisecond.<\/li>\n<\/ul>\n<h2>Diving into <code>getgccversion()<\/code><\/h2>\n<p>Turns out <code>getgccversion<\/code> gets gcc\u2019s version in two ways. <code>gcc --dumpversion<\/code> or <code>gcc-config -c<\/code>. If the former code path is taken, <code>getgccversion<\/code> takes a couple milliseconds, but if the latter code path is taken, <code>getgccversion<\/code> takes 450 milliseconds. I tried to find when the former codepath is taken, but it is almost never and it seemed like <code>gcc-config -c<\/code> is unnecessary. So, I avoided the call and created a new code path just for <code>--version<\/code> (so that no other part of portage is affected).<\/p>\n<h2 id=\"patch\">Patch<\/h2>\n<p>Together with a new code path for <code>--version<\/code>, with the help of my mentor Sam James, we also refactored a few big functions into smaller ones, and added a few quality of life changes like f-strings etc. The pull request can be found <a href=\"https:\/\/github.com\/gentoo\/portage\/pull\/1046\">here<\/a> With this change, the <code>emerge --version<\/code> goes from 750ms to 240ms.<\/p>\n<p>But, we did not consider the edge cases where the CHOST of the system where packages are\u00a0 compiled might be different than the system the binpkgs would be used. Though it does not affect the functionality of portage in any way, it <em>could<\/em> provide wrong information to the<br \/>\nend user when he\/she types <code>emerge --version<\/code>. So, the pull request is not merged yet and we are working on solutions to the problem. One of the main reasons for delay of resolution is the fact<br \/>\nthat I don\u2019t completely understand what exactly <code>gcc-config <\/code>does. It is a bash script and I have very little knowledge of bash. We are working on a solution and will try to get the changes merged into master.<\/p>\n<h2 id=\"side-benefit\">Side benefit<\/h2>\n<p>Studying the portage codebase for emerge \u2013version has been indeed a fruitful one. I am getting more familiar with the codebase and I was able to find an unreachable code block (duplicated logic). I submitted a<a href=\"https:\/\/github.com\/gentoo\/portage\/pull\/1044#issuecomment-1558432166\"> pull request<\/a> and it was merged. Sam <a href=\"https:\/\/github.com\/gentoo\/portage\/pull\/1044#issuecomment-1558432166\">commented\u00a0<\/a><br \/>\nthat I am getting familiar with the codebase. That felt good.<\/p>\n<h2 id=\"next-week\">Next week<\/h2>\n<p>I need more understanding of portage\u2019s internals. Sam suggested that I add docstrings and type annotations to the codebase. That\u2019ll help new developers as well as help me understand the codebase more. So, the next week will most probably be spent type annotating and adding docstrings. I\u2019ll also spent a bit of time learning bash so that I can work on gcc-config and many more as portage\/gentoo relies a lot on bash.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>Overall, the first week was a productive one. Though the pull request is not merged yet, it has good changes with respect to refactoring. If the new codepath is not sucessful, we\u2019d drop those commits to merge in the rest, hopefully we fix the patch to work on all CHOSTs. See you next week! Have a good one!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Week 1 &#8211; Modernization of Portage Coding period starts So, it\u2019s the first week of the official coding period and I wanted to write some code and get it\u00a0 merged into the master branch (I understand it\u2019s a bit over &hellip; <a href=\"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/week-1-modernization-of-portage\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":180,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,19],"tags":[3,24],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/475"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/users\/180"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/comments?post=475"}],"version-history":[{"count":2,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/475\/revisions"}],"predecessor-version":[{"id":477,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/475\/revisions\/477"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/media?parent=475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/categories?post=475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/tags?post=475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}