catcream

Hello! This is the main page for my Google Summer of Code work.
For information about me, see: about me.

2023

During this summer I have worked on getting LLVM-libc running on Gentoo. LLVM-libc is a standard C library like the Gentoo supported glibc and musl libc’s. It is currently not complete enough to be used as system libc for day-to-day work, however, it is now possible to use Crossdev to compile and test anything in the Gentoo tree against it.

My work

My work this summer has consisted of upstream fixes to LLVM-libc and LLVM/Clang support for Crossdev, combined with an LLVM-libc ebuild.

Started with a simple sysroot

I started out by testing LLVM-libc in a sysroot as described in the full build wiki page on https://libc.llvm.org/full_host_build.html There I started with by trying to build Python. I ran into a lot of build issues but ended up upstreaming fixes to LLVM-libc quite early on. I did several simple fixes, like typos, wrong argument types, and bad usage of platform macros to poke around in the code base. To work around missing libm functions I used Julia’s openlibm which, to my surprise, just worked out of the box.

Went on to work on LLVM/Clang Crossdev

After I had done some work in the sysroot I went on to make LLVM-libc work with Gentoo tooling like Crossdev and Portage. To do this I needed to make Crossdev able to use LLVM/Clang as a cross compiler. The steps to bootstrap a pure LLVM cross toolchain (for hosted environments, with libc) is to first install libc headers, then compile compiler-rt (LLVM builtins, like libgcc) targeting these headers, and finally compile the libc using the just built builtins and startup routines. I first tried to do all this manually with musl libc to make sure the method worked, which it did!

When planning this project I thought that it would be super simple to just add an additional flag into Crossdev that made it compile compiler-rt, and disabled building GCC stages. However, there was a lot to consider here that I did not think about which made it take a lot longer. Instead of a side project this turned into a very big part of my project. Seemingly basic things like making sure Clang with the correct flags was used to compile packages, turned out to be a whole lot of work, and I will use this as an example here.

Firstly I needed to make sure Clang used the right options, like --sysroot, --target, and also disabling any host configurations (--no-default-config). So I made Crossdev spew out a config to ${EROOT}/etc/clang/cross/${CTARGET}.cfg with these options. Great, now I just need to make sure packages use this config with either CC, and CXX or CFLAGS, and CXXFLAGS.

Secondly I needed to make sure Crossdev ebuilds knew that they were created by LLVM/Clang Crossdev. To differentiate between GCC Crossdev and LLVM/Clang Crossdev I decided to use a new category name, cross_llvm-${CTARGET}. Now ebuilds can check the CATEGORY variable if it matches cross_llvm-* or cross_-*. This turned out to introduce large amounts of repeated boilerplate into ebuilds, so instead I opted to create an eclass, crossdev.eclass, where this code can live.

At this point I decided to check for LLVM/Clang Crossdev in crossdev.eclass and automatically export variables like CC, CXX, and LLVM binutils equivalents, however, this introduced the issue that you cannot override variables. So, instead I opted to create package.env entries (make.defaults for the sysroot) and setting variables there. This worked, but I got some issues with packages not liking spaces in CC, and it also did not handle upgrades to Clang well.

To solve this I created an ebuild called clang-crossdev-wrappers that created ${CTARGET}-clang{,++,-cpp}-{,${LLVM_MAJOR}} binaries and symlinks into LLVM bindir (ex. aarch64-gentoo-linux-musl-clang-cpp-18). This has the advantage that these binaries get updated together with Clang because the package manager will update both at the same time. I also needed these binaries versioned because llvm.eclass will append -${LLVM_MAJOR} to any Clang invocation.

Playing with things like this, running into issues, and finally figuring out a solutions took a lot longer than I had initially planned, and in this case I just ended up being able to robustly run Clang with the right options, but it was definitely worth it 😀

LLVM-libc with LLVM/Clang Crossdev

To cross compile LLVM-libc I needed to build the build tools for CBUILD and libc for CHOST separately. For this I decided to use a “runtimes” build for LLVM-libc where runtimes/ is used as the root source directory of the build. There wasn’t a lot of documentation about this apart from a build example in the docs, and libc++ people telling you to use “runtimes/”. I quickly ran into an issue with this, which was that SCUDO, the LLVM-libc allocator, could not be baked in to LLVM-libc because compiler-rt was not in LLVM_ENABLED_PROJECTS. This variable is sadly not a thing for runtimes-builds :/. I didn’t know if this was a technical thing or just a build system mistake. So I compiled LLVM-libc as usual, then manually compiled all the SCUDO source code files to object files, and finally appended them to the final libc.a static library by running llvm-ar rcs libc.a ${i} in a loop. Cursed but worked. When I knew it worked manually I then simply patched the CMake source to also check for compiler-rt in LLVM_ENABLED_RUNTIMES and it just worked!

I then moved on to create an ebuild for LLVM-libc. My plan was to make separate ebuilds for all build tools (currently just libc-hdrgen) and LLVM-libc. This was because I did not want my build tools to be cross compiled, and the simplest way was to make these normal packages instead of logic inside the LLVM-libc ebuild that picked a non-crosscompiler for these tools. This ended up just working together with the -DLIBC_HDRGEN_EXE flag for LLVM-libc.

Another thing I needed to do was patch gnuconfig to add an entry for -llvm, because currently all autotools based projects will fail since config.sub from gnuconfig does not know about LLVM-libc, this is something that needs to be addressed upstream.

When LLVM/Clang Crossdev was usable started using it instead of the sysroot for my work. This allowed me to simply run ‘x86_64-gentoo-linux-llvm-emerge‘ to compile any package from the Gentoo tree targeting LLVM-libc. When I got to this point I started to work a bit more on LLVM-libc and then worked on missing things like fileno, fdopen, and limits.h. I also found some more build system bugs at this point, and I got commit access to LLVM!

Future work

Crossdev

LLVM/Clang crossdev is currently not able to compile C++ programs and use unwinding, however, I have verified that this works manually and I am in the process of making the libcxx, libcxxabi and llvm-libunwind ebuilds cross-aware.

LLVM-libc

I have some work on LLVM-libc that is not yet upstreamed or fully done. This includes the limits.h header, and the fdopen function. I did briefly have limits.h added to LLVM-libc but I ended up reverting it because I relied on the #include_next GNU extension that triggered a warning. In normal cases the freestanding compiler headers will be prefered over libc headers and then, after defining definitions from the freestanding header, include_next the system libc header for hosted environments.
The problem here is that in the LLVM-libc build it includes its own headers first, and internal code that uses limits.h will pick up LLVM-libc’s limits.h. This could be avoided by defining everything from the freestanding headers in LLVM-libc, but we want to avoid that and instead rely on compiler headers. My first solution was to use include_next, but that triggers a GNU extension warning. My second thought was to force compiler headers to be first, but because we are compiling in freestanding mode (-ffreestanding), __STDC_HOSTED__ will be false and neither Clang nor GCC will include system headers in that case. We are currently discussing this in #libc, and one seemingly promising solution is to use (–sysroot/-isysroot), however this will not work when using external headers such as linux-headers for Linux.

What I have learned

During this project I have mainly learned about bash scripting, implementing libc functions, and CMake. I’ve also gotten better at using git.

When working on Crossdev I got more comfortable at writing Bash, I learned things like built-in string manipulation and how to do arrays, dictionaries, “for each” type loops, and switch statements. Before this I was a total beginner at shell scripting.

By working on llvm-libc I have seen how libc functions are implemented on Linux by wrapping syscalls. I have also gotten a lot more comfortable writing and interacting with CMake source.

I’ve also learnt more about git, especially git-worktree which I have used for everything after Sam showed me it, and how to more effectively rebase commits.

Links

2022

This summer I have been working on getting the Plasma Desktop and KDE Gear applications to run and pass tests on Gentoo musl. Along the way I picked up some side projects and in the end I successfully met all my goals, and a bit more. It has been both fun and educational.

My work

The majority of my GSoC work at the starting weeks was about fixing build-time issues for dependencies. This was generally pretty easy, as you get a lot of help from the compiler for figuring out what went wrong. Many of these build-time issues were caused by GNU extensions, and writing standards-compliant replacements is often fun and not too hard. You also learn a lot about writing proper portable C code.

The second most common issue I ran into were runtime crashes, like segfaults. These issues were often a bit harder to figure out than the build-time ones, but as I got the hang of GDB and tools like strace I could figure these out pretty efficiently too!

Another type of issue I worked on a lot in the last weeks of GSoC were failing tests. Figuring out what had gone wrong, and why, was often pretty hard compared to other issues. I spent a lot of time with both Konsole and Okular here.

I have also picked up some relevant side projects along the way, like getting my router and PinePhone Pro to run Gentoo musl. I also did some miscellaneous non-musl related work, such as fixing issues in Portage and creating Gentoo development programs.

  • Here is a quick link to all my pull requests to the main gentoo tree: gentoo.git PRs by alfredfo.
  • My GitHub profile has some of my other Gentoo related work, such as ehide, libexecinfo-unw, and various -standalone packages: alfredfo on github
  • A few pull requests to the musl overlay can be seen here: musl.git PRs by alfredfo
  • I have done some work on the KDE GitLab, here’s my profile: catcream on invent.
  • Some work has been done in other places, such as the lvm-devel mailing list and plocate’s git forge, more details can be seen in the individual weekly reports.

What I have learned

During this project I have learned a lot regarding free software development, especially workflow. I have gotten a lot better at using git, and I am now comfortable with efficiently editing history. More on this, and an example, can be found in my weekly blog #10.

Regarding build-time failures I have gotten very used to the workflow of observing a failure, to debugging the problem, changing the source code, generating a patch, and finally submitting it both upstream and to the Gentoo ebuild repository. I feel like this could be done a lot more comfortably with an unimplemented Portage feature called ‘ebuildshell’, it is something I want to get into mainline Portage after GSoC.

Another thing I have gotten a lot better at after GSoC is C programming. I now clearly understand why it is important to follow the standards for the system you are targeting. Many of the packages I fixed were incorrectly using GNU extensions, even though the code was meant to run on POSIX systems, and that’s how I learned about it. Related to C programming I’ve gotten a lot better at using build systems such as CMake and meson. These were extensively used in the projects I worked on, and I often had to work with the build systems themselves. I used meson for my personal projects in GSoC whenever I had the chance, like with ehide and libexecinfo-unw. It is my favourite build system out of everything I have tried so far (cmake, cargo, msbuild/vcxproj, go, autotools, …) without a doubt.

Documentation I have written

This improves upon existing documentation from my mentor Sam. I have rewritten the page to be less like a list of errors with fixes, and more like a normal wiki page. I’ve also added several issues with fixes myself.

This is a guide for using Gentoo musl. It shortly describes the process of setting up a Gentoo musl system, what to do in case of build failures. Links to other resources such as -standalone packages, and my multimedia documentation for chroots are available on the page too.

This was made for my Gentoo musl usage guide. Some users may need to work with graphical applications not yet ported to musl. In that case it is a good idea setting up a second glibc root and using it via chroot(1). Setting up sound and graphics clients to communicate with the host server is not trivial, and therefore I added documentation.

Weekly reports

Here I will quickly summarize important things that I’ve learned during each week. Please see the individual reports to see what I have actually done 🙂

I actually got the Plasma desktop running the first week of GSoC after fixing some build-time dependency errors. I also fixed flatpak-builder so that I could run the glibc applications that I needed.

This week was similar to the first, meaning a lot of build time issues were fixed for packages that used GNU extensions. I also investigated runtime issues for Baloo and lvm2. I managed to figure out the lvm2 one by compiling with debugsyms and installsources, then debugging with GDB (with GEF). In week two I also learned how to check for headers with CMake during configure phase, and I also figured out a bug with Flatpak which was caused because system bubblewrap was built with suid.

During this week I started working on AccountsService, a thing I actually put off for week 11 because it involved a LOT of things, such as GLib, DBus, testing with LDAP. Though I played with it a lot this week. I also learned using the tatt tool, it is a tool used for automatically testing ebuilds with various USE-flags etc. I here also figured out why Rust was broken on musl, and it had to do with it defaulting to statically link everything for musl targets. This broke packages like libkgapi, when having the same library both statically and dynamically linked.

This week we discussed adding more standalone packages to Gentoo, and I wrote a blog post about it here. I also packaged libexecinfo for Gentoo, and made a fork of it which used meson and had some general improvements. This later got revised in following weeks by me implementing it via libunwind. This week I spent an unreasonable amount of time debugging an lvm2 issue that caused segfaults at boot time. I learned to use strace effectively (-ff -o + grep -rsin), as well as how to set up sanitizers in Portage.

In this week I made sure that all tests for the Plasma desktop ran fine. I also spent some time getting my router to run Gentoo musl, which involved setting up crossdev, u-boot, and nftables. I got everything inside kde-apps-meta to build, except for packages that depended on qtwebengine. I worked on k3b, glib, fchroot, some standalone libraries, speech-dispatcher, and a proposed patch on GNU gettext. I reinstalled my computer to make sure that everything up to this point had been upstreamed or added to ::gentoo (no local patches in /etc/portage/patches for example).

This week I worked on QtWebEngine, a monster of a program that bundles Chromium, and Chromium in turn bundles a bunch of libraries. Alpine had some patches for it which I partly used, but these were badly documented and sometimes broke the build for glibc. A lot of time was allocated to fix this properly.

In this week I PR:ed QtWebEngine to ::gentoo. I also reimplemented libexecinfo with libunwind, because it avoids using error prone __builtin_* stuff. I ran into a very annoying bug in Konsole which caused a test to fail when running it with Zsh instead of Bash. I have yet to solve it actually. Both me and Sam were pretty stuck here. It acted like it inputted some text when running with Zsh by playing with the test source.

In this week I got my PinePhone Pro and I installed Gentoo musl on it. To my surprise it was very straight forward and basically no packages failed to build! I played a lot with crossdev and distcc and ran into some issues there. For example some Portage tools did not respect ROOT, so I sent patches upstream.

In this week I learned how to use tools like iwdevtools, lddtree, and scanelf for finding out dependencies for applications. I also created my own tool called ‘ehide’ to hide installed Portage packages from the filesystem. This works by creating a new mount namespace, and bind-mounting /dev/null to every file installed by that package. ehide turned out to be EXTREMELY useful for figuring out if a dependency actually needs to be installed for a package to run. I learned about Linux capabilities, and how to set them in ebuilds using fcaps.eclass.

This week I created a new tool I called ‘gdepends’. It recursively prints out dependencies for a packages, which can be very useful for running “FEATURES=test emerge” on metapackages, or in this case metameta-packages such as kde-apps-meta (kde-apps-meta depends on kde*-meta, which in turn depends on actual packages). I fixed the rest of the kde-apps-meta test failures. This week I also PR:ed a bunch of Mauikit applications, and I learned to untangle my git history effectively with rebase.

In this week I added a new variable into cmake.eclass, CMAKE_SKIP_TESTS. It’s an array of tests that should be skipped in src_test. This is because the “myctestargs” syntax is very annoying to deal with when appending tests at multiple locations. I also continued my work on AccountsService here, and learned a lot about it. Sadly it turned out that it actually didn’t need a big rewrite because the initial error, being that it got a list of users by enumerating /etc/passwd, was actually on purpose. Next I solved a long standing gettext issue relating to libintl and the function gl_get_setlocale_null_lock. This was a pretty hard bug to figure out and solve.

In this week I cleaned up some -standalone packages and wrote a lot of documentation which can be seen above.

Things I want to work on after GSoC

  • Ebuildshell allows you to stop at certain phases of an ebuild merge, just like a breakpoint in debuggerspeak, and drops you into a shell with the same environment the build process has. This will be extremely good for package testing (src_test) as test can be very environment dependent, and setting up things such as Xvfb(1) can be a little annoying. Ebuildshell has been implemented in Prefix Portage before, a version of Portage that was initially made to support Gentoo Prefix. Rebasing and changing some things to make it compatible with latest Portage master shouldn’t be too hard.</li
  • Crossdev is an easy way of setting up cross compiling toolchains and allows you to easily build packages from the Gentoo ebuild repository for other architectures. It is a lot better than anything else I have used, but it definitely has its quirks. I personally have a few devices running non-x86 architecures. My router and PinePhone Pro runs aarch64-gentoo-linux-musl, and my VisionFive runs riscv64-gentoo-linux-musl. So it is something I have a big personal interest in making better!

My thoughts on the project

I think this project has been going great. I feel a lot more comfortable contributing to projects, and I would really like to continue contributing to Gentoo!

Gentoo musl is definitely something I’ll continue using on my computers.

Comments are closed.