{"id":512,"date":"2023-07-04T01:37:23","date_gmt":"2023-07-04T01:37:23","guid":{"rendered":"https:\/\/blogs.gentoo.org\/gsoc\/?p=512"},"modified":"2023-07-04T01:45:53","modified_gmt":"2023-07-04T01:45:53","slug":"weekly-report-5-llvm-libc","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/gsoc\/2023\/07\/04\/weekly-report-5-llvm-libc\/","title":{"rendered":"Weekly report 5, LLVM libc"},"content":{"rendered":"<p>Hey! This week I&#8217;ve spent most of my time figuring out how to bootstrap<br \/>\na LLVM cross compiler toolchain targeting a hosted Linux environment. I<br \/>\nhave also resolved the wint_t issue from last week. Both of these things<br \/>\ntook way longer than expected, but I also learned a lot more than<br \/>\nexpected so it was worth it.<\/p>\n<p>I&#8217;ll start with discussing the LLVM cross compiler setup. My initial<br \/>\nidea on how to bootstrap a toolchain was to simply specify <code>LLVM_TARGETS<\/code><br \/>\nfor the target architecture when building LLVM, then compile compiler-rt<br \/>\nfor the target triple, and then the libc. This is indeed true, but the official<br \/>\ncross compilation instructions tells you to specify a sysroot where the<br \/>\nlibc is already built, and that&#8217;s not possible when bootstrapping from<br \/>\nscratch.<\/p>\n<p>As the compiler-rt cross compilation documentation only tells you to use<br \/>\nan already set up sysroot, which I didn&#8217;t have, I had to try my way<br \/>\nforward. This actually took me a few days, and I did things like trying<br \/>\nto bootstrap with a barebones build of compiler-rt, mixing in some GCC<br \/>\nthings, and a lot of hacks. I then studied<br \/>\n<a href=\"https:\/\/github.com\/firasuke\/mussel\" rel=\"noopener\" target=\"_blank\">mussel<\/a> for a while until finding out about<br \/>\nheaders-only &#8220;builds&#8221; for glibc and musl. It turns out that the only<br \/>\nthing compiler-rt needs the sysroot for is libc headers, and those can<br \/>\nbe generated without a functioning compiler for both musl and<br \/>\nglibc. This is done by setting <code>CC=true<\/code> to pass all the configure tests<br \/>\nand then run &#8216;<code>make headers-install<\/code>&#8216; (for musl) into a temporary install<br \/>\ndirectory to generate the headers needed for bootstrapping<br \/>\ncompiler-rt.<\/p>\n<p><code>export CC=true<br \/>\n.\/configure \\<br \/>\n    --target=${CTARGET} \\<br \/>\n    --prefix=\"${MUSL_HEADERS}\/usr\" \\<br \/>\n    --syslibdir=\"${MUSL_HEADERS}\/lib\" \\<br \/>\n    --disable-gcc-wrapper<br \/>\nmake install-headers<\/code><\/p>\n<p>After this is done you can pass the following CFLAGS:<br \/>\n&#8216;<code>-nostdinc -I*path to temporary musl install dir*\/usr\/include<\/code>&#8216; to the<br \/>\ncompiler-rt build. <\/p>\n<p><code>-DCMAKE_ASM_COMPILER_TARGET=\"${CTARGET}\"<br \/>\n-DCMAKE_C_COMPILER_TARGET=\"${CTARGET}\"<br \/>\n-DCMAKE_C_COMPILER_WORKS=1<br \/>\n-DCMAKE_CXX_COMPILER_WORKS=1<br \/>\n-DCMAKE_C_FLAGS=\"--target=${CTARGET} -isystem ${MUSL_HEADERS}\/usr\/include -nostdinc -v\"<\/code><\/p>\n<p>After this is done you can export<br \/>\n<code>LIBCC=\"${COMPILER_RT_BUILDDIR}\"\/lib\/linux\/libclang_rt.builtins-aarch64.a<\/code><br \/>\nto the musl build to use the previously built compiler-rt builtins for<br \/>\nthe actual libc build.<\/p>\n<p>To then build actual binaries targeting the newly built libc you can do something like this:<\/p>\n<p><code>clang --target=\"${CTARGET}\" main.c -c -nostdinc -nostdlib -I\"${MUSL_HEADERS}\"\/usr\/include -v<\/p>\n<p>ld.lld -static main.o \\<br \/>\n       \"${COMPILER_RT_BUILDDIR}\"\/lib\/linux\/libclang_rt.builtins-aarch64.a \\<br \/>\n       \"${MUSLLIB}\"\/crti.o \"${MUSLLIB}\"\/crt1.o \"${MUSLLIB}\"\/crtn.o \"${MUSLLIB}\"\/libc.a<\/code><\/p>\n<p>Running the binary with qemu-user:<br \/>\n<code>$ cat \/etc\/portage\/package.use\/qemu<br \/>\n&gt; app-emulation\/qemu static-user QEMU_USER_TARGETS: aarch64<br \/>\n$ emerge qemu<br \/>\n$ qemu-aarch64 a.out<br \/>\n&gt; hello, world<\/code><\/p>\n<p>Afterwards it feels pretty obvious that the headers were needed, and I<br \/>\ncould&#8217;ve probably figured it out a lot sooner by for example examining<br \/>\ncrossdev a bit closer. But I am happy I did play with this since I<br \/>\nlearned things like what the different runtime libraries did, what&#8217;s<br \/>\nneeded to link a binary, and a lot more. Here&#8217;s a complete script that<br \/>\ndoes everything:<br \/>\n<a href=\"https:\/\/gist.github.com\/alfredfo\/e6c65293eb210bcf58e7cbdc80db3d7c\" rel=\"noopener\" target=\"_blank\">gist<\/a>.<br \/>\nNext I will integrate this into crossdev. Another thing I need to think<br \/>\nabout is how to do a header-only install of LLVM libc. Currently the<br \/>\nheaders get generated with libc-hdrgen and installed with the<br \/>\ninstall-libc target. Probably this can be done by packaging a standalone<br \/>\nlibc-hdrgen binary and using that for bootstrapping. I could also<br \/>\ntemporarily &#8220;cheat&#8221; and do a compiler-rt+libc build to get going.<\/p>\n<p>Next I also figured out what, and why, the wint_t problem occurs when<br \/>\nbuilding LLVM libc in fullbuild mode on a musl system (see last week&#8217;s<br \/>\nreport). The problem here is that on a musl system, \/usr\/include will be<br \/>\nfirst in the include path, regardless of <code>CFLAGS=\"-ffreestanding\"<\/code>. (for<br \/>\nC++ they will be after the standard C++ headers and then<br \/>\n<code>#include_next<\/code>&#8216;ed, so no difference). I thought at first that this was a<br \/>\nbug since you don&#8217;t want to target an environment where the libc is<br \/>\navailable (hosted environment) when building in freestanding<br \/>\nmode. However, after asking in #musl IRC this is actually fine since the<br \/>\nmusl headers respect the <code>__STDC_HOSTED__<\/code> variable that gets set when using<br \/>\n<code>-ffreestanding<\/code>, and there is a clear standard specifying what should be<br \/>\navailable in a freestanding environment.<\/p>\n<p>The problem arises because LLVM libc assumes that the Clang headers will<br \/>\nbe used when passing <code>-ffreestanding<\/code>, and therefore relies on Clang header<br \/>\ninternals. Specifically the <code>__need_wint_t<\/code> macro for <code>stddef.h<\/code> which is<br \/>\nin no way standardized and only an implementation detail. My thought<br \/>\nhere was to instead of relying on <code>CFLAGS=\"-ffreestanding\"<\/code> to use the<br \/>\nClang headers, we should instead figure out another way using the build<br \/>\nsystem to force Clang headers. Another way to solve this would also just<br \/>\nbe to also rely on musl internals (<code>__NEED_wint_t<\/code> for stddef.h).<\/p>\n<p>After discussing this we agreed to first actually get the libc built,<br \/>\nand then decide on a strategy once we know how many times similar issues<br \/>\npop up. If there are only a few instances of this then more #defines are<br \/>\nfine, else we could do something like the gcc buildbot target. My only<br \/>\nworry with this is that it will keep biting us in the ass as more things<br \/>\nget added.<br \/>\n<a href=\"https:\/\/github.com\/llvm\/llvm-project\/issues\/63510\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/llvm\/llvm-project\/issues\/63510<\/a><\/p>\n<p>Other things worth noting is that my &#8216;USE=emacs llvm-common&#8217; PR inspired a<br \/>\nnew <code>elisp-common.eclass<\/code> function called <code>elisp-make-site-file<\/code><br \/>\n<a href=\"https:\/\/github.com\/gentoo\/gentoo\/commit\/a4e8704d22916a96725e0ef819d912ae82270d28\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/gentoo\/gentoo\/commit\/a4e8704d22916a96725e0ef819d912ae82270d28<\/a>because mgorny thought that my sitefiles were a waste of inodes :D.<br \/>\n<a href=\"https:\/\/github.com\/gentoo\/gentoo\/pull\/31635\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/gentoo\/gentoo\/pull\/31635<\/a>. I also got my<br \/>\n<code>__unix__<\/code>-&gt;<code>__linux__<\/code> CL merged into LLVM. I do however have some worries<br \/>\nthat this could&#8217;ve broken some things on macOS as seen in my comment:<\/p>\n<p><code>&gt; done! I think there should be something addressing pthread_once_t and<br \/>\n&gt; once_flag for other Unix platforms though. Both of these would've<br \/>\n&gt; previously, before this commit, been valid on macOS, as __unix__ is<br \/>\n&gt; defined and __futex_word is just an aligned 32 bit int. No internal<br \/>\n&gt; Linux headers were used here before that would've caused an error.<\/code><\/p>\n<p><a href=\"https:\/\/reviews.llvm.org\/D153729\" rel=\"noopener\" target=\"_blank\">https:\/\/reviews.llvm.org\/D153729<\/a><\/p>\n<p>Next week I will try to make Crossdev be able to use LLVM\/Clang by<br \/>\nintegrating the things I did this week.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey! This week I&#8217;ve spent most of my time figuring out how to bootstrap a LLVM cross compiler toolchain targeting a hosted Linux environment. I have also resolved the wint_t issue from last week. Both of these things took way &hellip; <a href=\"https:\/\/blogs.gentoo.org\/gsoc\/2023\/07\/04\/weekly-report-5-llvm-libc\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":177,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[17],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/512"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/users\/177"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/comments?post=512"}],"version-history":[{"count":5,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/512\/revisions"}],"predecessor-version":[{"id":517,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/posts\/512\/revisions\/517"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/media?parent=512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/categories?post=512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/gsoc\/wp-json\/wp\/v2\/tags?post=512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}