May 2016 – GSoC 2016 project

Build GNU-free executables with clang

Previously we discussed how to build a LLVM C++ runtime stack with libc++, libc++abi and libunwind. Along with musl, we now have a GNU-free C/C++ runtime environment. But an unfortunate fact is, clang is used to living with GCC and glibc, and it takes some extra effort to make clang work with our new environment. In my last post, I demonstrated how to make a wrapper of clang to build GNU-free executables. As the name of this project implies, we want a native clang, not an ugly wrapper. So in this post, I’m going to show you how to build a native clang that works “out of the box”.

Before getting into it, let’s first analyze what dependencies a program built by clang typically has. For C programs, it’s of course glibc; and for C++ programs there’s also libstdc++. Yet there’s a lesser known library that every program relies on: libgcc. Sometimes it’s statically linked into the executable, other times dynamically linked in the form of “libgcc_s”. libgcc is a low-level runtime library provided by GCC. More information can be found at: https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html. LLVM also has a replacement for it: compiler-rt; and we’ll need this later.

The biggest obstacle preventing clang from working with musl is that musl has its own dynamic linker which could not be recognized by clang. A naive workaround is to rename musl’s linker to the same name as glibc’s, but that would obviously mess up the whole system. We’ll have to take a alternative approach (which I’m personally resistant to): modify clang/LLVM’s source code.

Two rudimentary patches that work on x86_64 platforms could be found here: https://github.com/zzlei/musl-clang. As their names imply, one patch is for the LLVM source root, and the other for clang. Assume you’ve already checked out LLVM, clang and compiler-rt to the right location, say $LLVM, $LLVM/tools/clang and $LLVM/projects/compiler-rt respectively. After applying the patches, issue the following command to build them all together:
$ mkdir $LLVM/build && cd $LLVM/build $ cmake -DGCC_INSTALL_PREFIX=/usr \ -DDEFAULT_SYSROOT=/usr/x86_64-pc-linux-musl \ -DCLANG_DEFAULT_CXX_STDLIB=libc++ \ -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-pc-linux-musl .. $ make

Some explanations:
DEFAULT_SYSROOT tells clang where to find musl’s headers and libraries. It’s pointed to the location where the musl toolchain is installed.
GCC_INSTALL_PREFIX specifies where GCC is installed. clang needs this to find crtbegin.o and crtend.o. This part is a bit thorny, as neither musl or clang provides these files. We’ll need to replace them with some other vendor’s later in this project.
CLANG_DEFAULT_CXX_STDLIB tells clang to use libc++ by default. A vanilla clang on Linux always uses libstdc++ by default.
LLVM_DEFAULT_TARGET_TRIPLE informs clang that we’re targeting on musl-libc; without this clang won’t find the correct dynamic linker.

After putting the freestanding C++ runtime libraries we previously built under /usr/x86_64-pc-linux-musl/usr/lib, we should have a native clang that “almost” works out of the box. Why “almost”? Because we still need to feed one option to clang: “-rtlib=compiler-rt”, indicating the use of compiler-rt instead of libgcc. I’m still struggling to set this option permanently at build time; hopefully I don’t have to modify too much of clang’s code to achieve this…

Now, let’s take a final look of our product:
$ ./bin/clang++ hello.cc -rtlib=compiler-rt $ readelf -d a.out | grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libc++.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so]

Great!

Build a GNU-free C++ program on Gentoo

This post is a following-up of my previous one: Build a freestanding libc++. Here I’ll demonstrate how to link a C++ program with the freestanding libc++ we just built. The resulting executable will have no dependence on glibc or any GCC component, thus is GNU-free.

As the libc++ in use is linked with musl, the C++ program to be linked with libc++ need also be linked with musl. In the previous post we already got a musl-based toolchain via crossdev, but its C++ compiler is g++, which unfortunately doesn’t support libc++. We’ll have to use a vanilla clang++ instead to compile our C++ program then.

The problem with using a vanilla clang++ is that it’s not musl-aware, i.e. it doesn’t know where to find musl’s headers and libraries. Such information can only be fed to clang++ via some cumbersome command-line arguments. Assume our C++ runtime libraries (including libc++) built in previous post are installed under directory $LOCAL; the complete command for compiling a GNU-free C++ program looks like this:

clang++ hello.cc \ -nostdinc -isystem /usr/x86_64-pc-linux-musl/usr/include \ -I $LOCAL/include/c++/v1 \ -L /usr/x86_64-pc-linux-musl/usr/lib -L $LOCAL/lib \ -nostartfiles /usr/x86_64-pc-linux-musl/usr/lib/crt1.o \ -Wl,-dynamic-linker,/usr/x86_64-pc-linux-musl/lib/ld-musl-x86_64.so.1 \ -Wl,-rpath,/usr/x86_64-pc-linux-musl/usr/lib,-rpath,$LOCAL/lib \ -nodefaultlibs -stdlib=libc++ -lc -lc++

This is indeed a long command. Because clang doesn’t have something like GCC’s specs file, we have to elaborately specify every configuration on the command line. This is also what I’m going to improve this summer. Hopefully we’ll make clang more friendly to musl, so we don’t have to issue such obscure commands when clang is eventually deployed as Gentoo’s default compiler.

Now let me explain these command options in some detail:

–nostdinc tells clang not to include the standard headers, following by two arguments specifying location of C and C++ headers, then another two specifying location of shared libraries. This is necessary because clang is not configured with musl and thus has no way to find the correct headers and libs.
–nostartfiles tells clang to link the program with musl’s version of start files instead of glibc’s.
–dynamic-linker and –rpath are options passed to the linker. They tell the program what dynamic linker to use and where to find necessary dynamic libraries at runtime.
–nodefaultlibs, like -stdinc, tells clang to link with musl and the C++ runtime libraries we specify, instead of the default ones.

For convenience, I put this really long command in a shell script named musl-clang++ and replace “hello.cc” with “$@” for general use. Let’s see if it works:

$ musl-clang++ hello.cc -o hello $ readelf -d hello | grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libc.so] 0x0000000000000001 (NEEDED) Shared library: [libc++.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1] $ ./hello Hello
Yay!

Build a freestanding libc++

libc++ is the C++ standard library implemented by LLVM and an essential part of the clang-based toolchain we’re going to build. In this post I’ll demonstrate how to build a freestanding libc++ on Gentoo.

A complete C++ runtime stack consists of three components, from top to bottom:

a C++ standard library
a C++ ABI library
a stack unwinding library

As stated in my introductory post, in this GSoC project these roles will be taken by libc++, libc++abi and libunwind, respectively. As higher-level libraries depend on lower-level ones, we need to build them from bottom to top, i.e. :

build libunwind
build libc++abi against libunwind
build libc++ against libc++abi

All of the above, of course, should be linked with musl instead of glibc. So first of all, we need a proper toolchain that can link binaries with musl. Luckily, Gentoo’s developers already prepared such a toolchain for us; just type the following commands:
$ emerge layman crossdev && layman -a musl && \ crossdev -t x86_64-pc-linux-musl

Here we use layman to create a layout for musl, and then use crossdev to auto-magically build the toolchain. Check out all needed respositories and we are ready to build the libraries:

$ cd $REPOS $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm $ svn co http://llvm.org/svn/llvm-project/libunwind/trunk libunwind $ svn co http://llvm.org/svn/llvm-project/libcxxabi/trunk libcxxabi $ svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx

Build libunwind:
$ cd $REPOS/libunwind && mkdir build && cd build $ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \ -DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \ -DLIBUNWIND_ENABLE_SHARED=0 \ -DLLVM_PATH="$REPOS/llvm" .. $ make

Note: I want to statically link libunwind into libc++abi, so I disable the building of shared library through LIBUNWIND_ENABLE_SHARED. You may safely omit this option if you want a shared version.

Build libc++abi:
$ cd $REPOS/libc++abi && mkdir build && cd build $ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \ -DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \ -DCMAKE_SHARED_LINKER_FLAGS="-L$REPOS/libunwind/build/lib" -DLIBCXXABI_USE_LLVM_UNWINDER=1 \ -DLIBCXXABI_LIBUNWIND_PATH="$REPOS/libunwind" \ -DLIBCXXABI_LIBCXX_INCLUDES="$REPOS/libcxx/include" \ -DLLVM_PATH="$REPOS/llvm" .. $ make

Build libc++:
$ cd $REPOS/libc++ && mkdir build && cd build $ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \ -DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \ -DLIBCXX_HAS_MUSL_LIBC=1 \ -DLIBCXX_HAS_GCC_S_LIB=0 \ -DLIBCXX_CXX_ABI=libcxxabi \ -DLIBCXX_CXX_ABI_INCLUDE_PATHS="$REPOS/libcxxabi/include" \ -DLIBCXX_CXX_ABI_LIBRARY_PATH="$REPOS/libcxxabi/build/lib" \ -DLLVM_PATH="$REPOS/llvm" \ $ make

Note: libgcc is GCC’s stack unwinding library and should not be used in our C++ runtime stack, so I explicitly disable it through LIBCXX_HAS_GCC_S_LIB ; otherwise it’ll sneak into our library.

Now the C++ runtime stack is complete; it’s time to verify our work:
$ readelf -d $REPOS/libcxx/build/lib/libc++.so.1 | grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so]

libunwind is statically linked so is not shown. Dependencies on libc++abi and libc (musl) look correct. So far, so good 🙂

In the following post, I’ll demonstrate how to link an actual C++ program with this freshly built libc++. See you!

Hello GSoC 2016 !

I’m very glad that I’m accepted by Gentoo as a participant of Google Summer of Code this year. During this summer, I’ll be working on building a clang-based toolchain for Gentoo.

Clang is a modern C/C++ compiler developed by LLVM, famous for its modular design and non-intrusive license. The ideal result of this project is to provide a Gentoo profile, where clang is the default compiler in place of gcc.

As clang is written in C++, it needs a C++ runtime to work, which are basically a C standard library, a C++ standard library, a C++ ABI library and a stack unwinder. On a typical Linux host, glibc and libstdc++ are the de facto C and C++ standard libraries respectively. The functionality of C++ ABI library is also integrated in libstdc++; the stack unwinder is implemented in libgcc.

libstdc++ and libgcc are both parts of GCC, which won’t be available when we deploy clang as the default compiler. Luckily, besides clang, LLVM also developed a complete implementation of the C++ runtime, consisting of three libraries: libc++, libc++abi and libunwind. Unlike GCC, the C++ ABI library is implemented separately. To decouple our toolchain further from the GNU toolset, we’ll use musl as the libc.

Sum it up

In this project, we’ll build a toolchain with clang as the compiler, musl as libc and a C++ runtime composed of libc++, libc++abi and libunwind. If everything goes smoothly, this setup will be offered as a Gentoo profile; users who like the neat features of clang thus have the chance to say goodbye to GCC 🙂

I’ll update this blog regularly to reflect my most recent progress and share technical stuff that might be helpful to others. Stay tuned !