{"id":470,"date":"2023-06-18T11:29:02","date_gmt":"2023-06-18T11:29:02","guid":{"rendered":"https:\/\/blogs.gentoo.org\/gsoc\/?p=470"},"modified":"2023-06-18T11:30:43","modified_gmt":"2023-06-18T11:30:43","slug":"470","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/470\/","title":{"rendered":"Bonding Period 2 &#8211; Modernization of Portage"},"content":{"rendered":"<h1 id=\"bonding-period-2---modernization-of-portage\">Bonding Period 2 &#8211; Modernization of Portage<\/h1>\n<h2 id=\"context\">Context<\/h2>\n<p>In order to get familiar with the portage codebase, we decided that I\u2019d fix a few bugs. This blog post talks about the second half of the community bonding period (weeks 3 and 4) where I try to do that.<\/p>\n<h2 id=\"bugs-bugs-and-more-bugs\">Bugs, bugs and more bugs<\/h2>\n<p>When it comes to bugs, the paradox of choice is real. To choose from, there is a heap of them (1439 at the moment of writing). Most of the bugs are quality of life improvements as the portage team\u00a0 has put in a lot of effort to make sure portage does it\u2019s jobs without many errors. After searching, we decided to work on bug <a href=\"https:\/\/bugs.gentoo.org\/634576\">634576<\/a>.<\/p>\n<h2 id=\"section\">634576<\/h2>\n<p>Portage uses backtracking to calculate the dependencies of a package and it is a computationally\u00a0 intensive and time consuming process. If a person were to issue a command <code>emerge<\/code> 10 packages,\u00a0 portage calculates the dependencies one by one and if he\/she were to misspell a single package,\u00a0 portage would calculate the dependencies of other packages before recognizing that the name of a\u00a0 package is wrong. It fails, but only after calculating dependencies of other packages. At this point,\u00a0 all the computation done is also being wasted. At the time of filing of this bug, portage did not\u00a0 cache it\u2019s calculations and so in the next run, all the dependencies are calculated again. Ideally, portage should have recognized the package does not exist and it should \u201cfail faster\u201d.<\/p>\n<h3 id=\"reproducing-the-bug\">Reproducing the bug<\/h3>\n<p>The bug was confirmed and so that means the portage team was able to reproduce it. So we tried<\/p>\n<pre><code># emerge www-client\/chromium \"&lt;cython-3\" libreoffice dev-lang\/ghc\r\n  dev-haskell\/doctest dev-ruby\/actionpack firefox tensorflow idonotexist<\/code><\/pre>\n<p>and to our surprise, emerge failed fast. We can\u2019t just close the bug without giving context and so we had to find the commit that fixed it.<\/p>\n<h2 id=\"git-bisect\">Git bisect<\/h2>\n<p>One of the mentors, Sam James suggested we use git bisect. It is a clever feature of git. I was very\u00a0 glad when I read about git bisect. It was very cool to see binary search being used in real world. Git bisect has an option for automated testing. We write an application (or script), based on which\u2019s\u00a0 exit code, git bisect can find \u201cgood\u201d and \u201cbad\u201d commits. We noticed that if portage fails faster, it fails within 1.8 seconds. So we wrote the following script.<\/p>\n<div id=\"cb2\" class=\"sourceCode\">\n<pre class=\"sourceCode py\"><code class=\"sourceCode python\"><span id=\"cb2-1\"><span class=\"co\">#!\/usr\/bin\/env python<\/span><\/span>\r\n<span id=\"cb2-2\"><\/span>\r\n<span id=\"cb2-3\"><span class=\"im\">import<\/span> subprocess<\/span>\r\n<span id=\"cb2-4\"><span class=\"im\">import<\/span> time<\/span>\r\n<span id=\"cb2-5\"><\/span>\r\n<span id=\"cb2-6\">a <span class=\"op\">=<\/span> time.time()<\/span>\r\n<span id=\"cb2-7\">subprocess.run(<\/span>\r\n<span id=\"cb2-8\">            [<\/span>\r\n<span id=\"cb2-9\">                <span class=\"st\">\"emerge\"<\/span>,<\/span>\r\n<span id=\"cb2-10\">                <span class=\"st\">\"www-client\/chromium\"<\/span>,<\/span>\r\n<span id=\"cb2-11\">                <span class=\"st\">\"libreoffice-dev\"<\/span>,<\/span>\r\n<span id=\"cb2-12\">                <span class=\"st\">\"dev-lang\/ghc\"<\/span>,<\/span>\r\n<span id=\"cb2-13\">                <span class=\"st\">\"dev-haskell\/doctest\"<\/span>,<\/span>\r\n<span id=\"cb2-14\">                <span class=\"st\">\"dev-ruby\/actionpack\"<\/span>,<\/span>\r\n<span id=\"cb2-15\">                <span class=\"st\">\"firefox\"<\/span>,<\/span>\r\n<span id=\"cb2-16\">                <span class=\"st\">\"tensorflow\"<\/span>,<\/span>\r\n<span id=\"cb2-17\">                <span class=\"st\">\"idonotexist\"<\/span><\/span>\r\n<span id=\"cb2-18\">            ]<\/span>\r\n<span id=\"cb2-19\">    )<\/span>\r\n<span id=\"cb2-20\">b <span class=\"op\">=<\/span> time.time()<\/span>\r\n<span id=\"cb2-21\">t <span class=\"op\">=<\/span> b <span class=\"op\">-<\/span> a<\/span>\r\n<span id=\"cb2-22\"><\/span>\r\n<span id=\"cb2-23\"><span class=\"cf\">if<\/span> t <span class=\"op\">&gt;<\/span> <span class=\"fl\">1.8<\/span>: <span class=\"co\"># If t goes above 1.8, it means dependencies are being calculated<\/span><\/span>\r\n<span id=\"cb2-24\">    exit(<span class=\"dv\">0<\/span>) <span class=\"co\"># Says to git bisect this is good (we want to find the bad commit)<\/span><\/span>\r\n<span id=\"cb2-25\"><span class=\"cf\">elif<\/span> t <span class=\"op\">&lt;<\/span> <span class=\"fl\">0.2<\/span>:<\/span>\r\n<span id=\"cb2-26\">    exit(<span class=\"dv\">127<\/span>) <span class=\"co\"># Says to git bisect to check nearby commit.<\/span><\/span>\r\n<span id=\"cb2-27\"><span class=\"cf\">else<\/span>:<\/span>\r\n<span id=\"cb2-28\">    exit(<span class=\"dv\">1<\/span>) <span class=\"co\"># Bad commit (we want to find this)<\/span><\/span><\/code><\/pre>\n<\/div>\n<p>We need the <code>exit(127)<\/code> line because some of the commits leave portage repo at an inconsistent\u00a0 state. When the program exits with an exit code of 127, it tells git bisect to ignore the current\u00a0 commit and check nearby commits. We ran the script with the following command.<\/p>\n<pre><code>git bisect .\/script<\/code><\/pre>\n<p>The output of the run can be found <a href=\"https:\/\/gist.github.com\/berinaniesh\/2f5dc46419fff3dc88bd9471831b1cd9\">here<\/a>.<\/p>\n<p>Due to EAPI differences and the portage status being inconsistent between commits, we could not identify the exact commit that fixed it but it was somewhere around commit \u201c0f3070198c56a8bc3b23e3965ab61136d3de76ae\u201d, which was around 2021 when caching\u00a0 capabilities are added to portage. With this information, the bug was closed successfully.<\/p>\n<h2 id=\"summary\">Summary<\/h2>\n<p>To summarize, we looked at a few bugs and closed one using git bisect and a small python script. I\u00a0 am also getting familiar with the codebase and I have been looking at places to work on.<code>emerge --version<\/code> seems like a nice and simple corner to start. That marks the end of the \u201ccommunity\u00a0 bonding period\u201d. The \u201cofficial coding period\u201d starts next week and I can barely contain my excitement!!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bonding Period 2 &#8211; Modernization of Portage Context In order to get familiar with the portage codebase, we decided that I\u2019d fix a few bugs. This blog post talks about the second half of the community bonding period (weeks 3 &hellip; <a href=\"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/470\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":180,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,19],"tags":[3,24],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/470"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/users\/180"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/comments?post=470"}],"version-history":[{"count":3,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/470\/revisions"}],"predecessor-version":[{"id":474,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/470\/revisions\/474"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/media?parent=470"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/categories?post=470"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/tags?post=470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}