{"id":480,"date":"2023-06-18T11:45:27","date_gmt":"2023-06-18T11:45:27","guid":{"rendered":"https:\/\/blogs.gentoo.org\/gsoc\/?p=480"},"modified":"2023-06-18T11:45:27","modified_gmt":"2023-06-18T11:45:27","slug":"week-3-modernization-of-portage","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/week-3-modernization-of-portage\/","title":{"rendered":"Week 3 &#8211; Modernization of Portage"},"content":{"rendered":"<h1 id=\"week-3---modernization-of-portage\">Week 3 &#8211; Modernization of Portage<\/h1>\n<p>It is the third week of the coding period. It is mostly an uneventful week. Most part was spent on\u00a0 trying to understand the dependency resolution algorithm. In the second part of the week I also did\u00a0 some refactoring and some type hints.<\/p>\n<h2 id=\"update-on-the-blog-posts\">Update on the blog posts<\/h2>\n<p>I lost my password to access this blog and also had troubles resetting the password. That is why I\u00a0 have not been able to post per week. With help from BlueKnight, I got my access back. So, I am dumping the blog posts I have written, all at once. From next week, I expect posts to be at regular intervals (one per week). Sorry about the bulk posting, hope you don\u2019t mind.<\/p>\n<h2 id=\"portage-vs-pacman\">Portage vs Pacman<\/h2>\n<p>The most significant part of portage is it\u2019s dependency resolution system. It is very different from all other package managers due to the unique concept ot \u201cUSE\u201d flags.<\/p>\n<p>As many can guess, a graph data structure is used extensively to find the dependencies of a package. The process is somewhat trivial in most package managers. Arch Linux\u2019s pacman for example constructs a graph with package names as nodes and it\u2019s dependencies as it\u2019s children. Now it\u2019s a matter of a simple graph traversal. The detected dependencies are copied to a list and everything is downloaded from the a server. The implementation can be found <a href=\"https:\/\/gitlab.archlinux.org\/pacman\/pacman\/-\/blob\/master\/lib\/libalpm\/deps.c#L126-137\">here<\/a>.<\/p>\n<p>Portage uses a different algorithm called backtracking. Portage finds the package\u2019s use flags and backtracks the packages related to that use flag. It is recursive and the process repeats. The time complexity is O(N!) and I understood that\u2019s the reason portage is slow and not because of python. Even after hours of looking at the codebase it is hard to make sense of the exact workings. Due to the sheer number of features portage offers, the process is many folds complex than pacman. This can be understood by the fact that the dependency resolution part (not completely) of pacman is like <a href=\"https:\/\/gitlab.archlinux.org\/pacman\/pacman\/-\/blob\/master\/lib\/libalpm\/deps.c\">927 lines of c code<\/a> at the moment of writing this and the portage equivalent is about <a href=\"https:\/\/gitweb.gentoo.org\/proj\/portage.git\/tree\/lib\/_emerge\/depgraph.py\">11814 lines of python code<\/a>. I also had the code profiled and<code> depgraph.py<\/code> alone takes like 95% of the total runtime. So it is useful to learn about the algorithm. The profiling results can be found <a href=\"https:\/\/profiling.berinaniesh.pp.ua\/portage\">here<\/a>.<\/p>\n<p>Here is the image version. <img src=\"https:\/\/gist.githubusercontent.com\/berinaniesh\/3da9bae3f06030117e2ab0c2f34c2df8\/raw\/de895501424c5709c620ffc4c92c574c04d44606\/profile.png\" alt=\"profile\" \/><\/p>\n<p>Portage also uses granular, file level locks, whereas pacman uses one lock for the whole runtime.\u00a0 This allows multiple instances of portage to run at the same time.<\/p>\n<p>Studying portage\u2019s codebase made it much easier for me to understand pacman\u2019s codebase and in no time, even though I don\u2019t have much experience in C. I am glad I got to work on portage, but also sad that I cannot understand the codebase enough (yet) to contribute in a deeper way. Portage is years of work of many smart minds and it is unreasonable of me to expect to understand things fast.<\/p>\n<h2 id=\"refactoring-tidying-up-portage\">Refactoring \/ Tidying up portage<\/h2>\n<p>The second half of the week was towards finding something to improve on portage. I added a few type hints and refactored some big functions into smaller ones for better readability. Also found some unreachable code and removed them as well. I think they are nice improvements. I sent a pull request which can be found <a href=\"https:\/\/github.com\/gentoo\/portage\/pull\/1058\">here<\/a>. It is not merged yet, I think it will be soon.<\/p>\n<h2 id=\"summary\">Summary<\/h2>\n<p>To summarize, the week was spent delving more into the cProfile results of portage and comparing it with pacman. I also refactored the codebase a little bit, which makes it a tiny bit better.<\/p>\n<p>I want to look at portage\u2019s ability to resolve circular dependencies but I am not sure if I can do it (if it is a simple task). If it were simple, portage developers would have done it already, but I still want to give it a try. I\u2019ll keep you posted. That marks the end of week three. See you in week four\u2019s post!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Week 3 &#8211; Modernization of Portage It is the third week of the coding period. It is mostly an uneventful week. Most part was spent on\u00a0 trying to understand the dependency resolution algorithm. In the second part of the week &hellip; <a href=\"https:\/\/blogs.gentoo.org\/gsoc\/2023\/06\/18\/week-3-modernization-of-portage\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":180,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,19],"tags":[3,24],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/480"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/users\/180"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/comments?post=480"}],"version-history":[{"count":1,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/480\/revisions"}],"predecessor-version":[{"id":481,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/480\/revisions\/481"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/media?parent=480"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/categories?post=480"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/tags?post=480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}