Weekly report 5, LLVM libc

Hey! This week I’ve spent most of my time figuring out how to bootstrap
a LLVM cross compiler toolchain targeting a hosted Linux environment. I
have also resolved the wint_t issue from last week. Both of these things
took way longer than expected, but I also learned a lot more than
expected so it was worth it.

I’ll start with discussing the LLVM cross compiler setup. My initial
idea on how to bootstrap a toolchain was to simply specify LLVM_TARGETS
for the target architecture when building LLVM, then compile compiler-rt
for the target triple, and then the libc. This is indeed true, but the official
cross compilation instructions tells you to specify a sysroot where the
libc is already built, and that’s not possible when bootstrapping from
scratch.

As the compiler-rt cross compilation documentation only tells you to use
an already set up sysroot, which I didn’t have, I had to try my way
forward. This actually took me a few days, and I did things like trying
to bootstrap with a barebones build of compiler-rt, mixing in some GCC
things, and a lot of hacks. I then studied
mussel for a while until finding out about
headers-only “builds” for glibc and musl. It turns out that the only
thing compiler-rt needs the sysroot for is libc headers, and those can
be generated without a functioning compiler for both musl and
glibc. This is done by setting CC=true to pass all the configure tests
and then run ‘make headers-install‘ (for musl) into a temporary install
directory to generate the headers needed for bootstrapping
compiler-rt.

export CC=true
./configure \
--target=${CTARGET} \
--prefix="${MUSL_HEADERS}/usr" \
--syslibdir="${MUSL_HEADERS}/lib" \
--disable-gcc-wrapper
make install-headers

After this is done you can pass the following CFLAGS:
-nostdinc -I*path to temporary musl install dir*/usr/include‘ to the
compiler-rt build.

-DCMAKE_ASM_COMPILER_TARGET="${CTARGET}"
-DCMAKE_C_COMPILER_TARGET="${CTARGET}"
-DCMAKE_C_COMPILER_WORKS=1
-DCMAKE_CXX_COMPILER_WORKS=1
-DCMAKE_C_FLAGS="--target=${CTARGET} -isystem ${MUSL_HEADERS}/usr/include -nostdinc -v"

After this is done you can export
LIBCC="${COMPILER_RT_BUILDDIR}"/lib/linux/libclang_rt.builtins-aarch64.a
to the musl build to use the previously built compiler-rt builtins for
the actual libc build.

To then build actual binaries targeting the newly built libc you can do something like this:

clang --target="${CTARGET}" main.c -c -nostdinc -nostdlib -I"${MUSL_HEADERS}"/usr/include -v

ld.lld -static main.o \
"${COMPILER_RT_BUILDDIR}"/lib/linux/libclang_rt.builtins-aarch64.a \
"${MUSLLIB}"/crti.o "${MUSLLIB}"/crt1.o "${MUSLLIB}"/crtn.o "${MUSLLIB}"/libc.a

Running the binary with qemu-user:
$ cat /etc/portage/package.use/qemu
> app-emulation/qemu static-user QEMU_USER_TARGETS: aarch64
$ emerge qemu
$ qemu-aarch64 a.out
> hello, world

Afterwards it feels pretty obvious that the headers were needed, and I
could’ve probably figured it out a lot sooner by for example examining
crossdev a bit closer. But I am happy I did play with this since I
learned things like what the different runtime libraries did, what’s
needed to link a binary, and a lot more. Here’s a complete script that
does everything:
gist.
Next I will integrate this into crossdev. Another thing I need to think
about is how to do a header-only install of LLVM libc. Currently the
headers get generated with libc-hdrgen and installed with the
install-libc target. Probably this can be done by packaging a standalone
libc-hdrgen binary and using that for bootstrapping. I could also
temporarily “cheat” and do a compiler-rt+libc build to get going.

Next I also figured out what, and why, the wint_t problem occurs when
building LLVM libc in fullbuild mode on a musl system (see last week’s
report). The problem here is that on a musl system, /usr/include will be
first in the include path, regardless of CFLAGS="-ffreestanding". (for
C++ they will be after the standard C++ headers and then
#include_next‘ed, so no difference). I thought at first that this was a
bug since you don’t want to target an environment where the libc is
available (hosted environment) when building in freestanding
mode. However, after asking in #musl IRC this is actually fine since the
musl headers respect the __STDC_HOSTED__ variable that gets set when using
-ffreestanding, and there is a clear standard specifying what should be
available in a freestanding environment.

The problem arises because LLVM libc assumes that the Clang headers will
be used when passing -ffreestanding, and therefore relies on Clang header
internals. Specifically the __need_wint_t macro for stddef.h which is
in no way standardized and only an implementation detail. My thought
here was to instead of relying on CFLAGS="-ffreestanding" to use the
Clang headers, we should instead figure out another way using the build
system to force Clang headers. Another way to solve this would also just
be to also rely on musl internals (__NEED_wint_t for stddef.h).

After discussing this we agreed to first actually get the libc built,
and then decide on a strategy once we know how many times similar issues
pop up. If there are only a few instances of this then more #defines are
fine, else we could do something like the gcc buildbot target. My only
worry with this is that it will keep biting us in the ass as more things
get added.
https://github.com/llvm/llvm-project/issues/63510

Other things worth noting is that my ‘USE=emacs llvm-common’ PR inspired a
new elisp-common.eclass function called elisp-make-site-file
https://github.com/gentoo/gentoo/commit/a4e8704d22916a96725e0ef819d912ae82270d28because mgorny thought that my sitefiles were a waste of inodes :D.
https://github.com/gentoo/gentoo/pull/31635. I also got my
__unix__->__linux__ CL merged into LLVM. I do however have some worries
that this could’ve broken some things on macOS as seen in my comment:

> done! I think there should be something addressing pthread_once_t and
> once_flag for other Unix platforms though. Both of these would've
> previously, before this commit, been valid on macOS, as __unix__ is
> defined and __futex_word is just an aligned 32 bit int. No internal
> Linux headers were used here before that would've caused an error.

https://reviews.llvm.org/D153729

Next week I will try to make Crossdev be able to use LLVM/Clang by
integrating the things I did this week.

This entry was posted in Bootstapping LLVM. Bookmark the permalink.

Leave a Reply

Your email address will not be published.