libc++abi v.s. libcxxrt

If you’ve read my previous posts, you may already know that a C++ standard library needs something called an “ABI library” to do certain low-level work for it. In the case of libc++, it supports libsupc++, libc++abi or libcxxrt as its ABI library. Among them, libsupc++ is not very well-known because it’s a part of GCC and usually not shipped standalone. In this post, we’ll be concentrating on libc++abi and libcxxrt.

libc++abi is the following effort by LLVM after it successfully developed a new C++ standard library — libc++. Developed by the same party, libc++abi works seamlessly with libc++. It’s now used on Apple’s platforms.

libcxxrt is developed by PathScale [1], a commercial compiler vendor, together with the BSD communities.  It’s now used in the three BSD derivatives and proved to be of industry quality.

While refactoring the ebuild for libc++, I hit a tough question: which ABI library to use as the default. libc++abi seems a more natural choice to me because it’s from the same bloodline as libc++. But libcxxrt should also be solid enough since it’s used in those rock-steady BSDs without any problem. Out of curiosity, I decided to do a simple benchmark between the two rivals.

I’m using a C++ program suggested by lu_zero for this benchmark [2]. I linked this program with libc++abi or libcxxrt (dynamically or statically), ran it three times and recorded the average runtime under each configuration. OpenMP is disabled for static linking because the OpenMP library used by clang is currently shared only on Gentoo. The test machine is a Gentoo VM with 4 cores and 4GB of memory. The parameter SPP (in source code) is altered from its default to 100 to reduce runtime. Below are the benchmark figures:

libc++abi, shared: 50.178s
libcxxrt, shared: 54.558s

libc++abi, static: 1m38.484s
libcxxrt, static: 1m35.134s

Apparently, libc++abi is slightly faster with dynamic linking, but a bit slower with static linking. Considering system fluctuation, the difference is nearly negligible. Due to its same bloodline as libc++, I’m going to keep libc++abi as the default ABI library in Gentoo 🙂



GSoC 2016: code submission status

This post serves as a tracker of code submitted in the domain of this GSoC project.

This summer, I worked out a bunch of patches that enhance clang/llvm with support for musl-libc, and had those patches contributed upstream. With these patches, clang is now able to correctly link binaries with musl.

llvm musl-libc support:

clang musl-libc support:

There’s still a pending compatibility issue that prevents llvm itself from being built on musl as is. musl and llvm’s developers have different views on this issue [1], and I haven’t yet found a solution that pleases both side. Currently we’re using a downstream patch in Gentoo to make llvm and musl compatible [2].

To have clang not link binaries with libgcc, I contributed another patch to clang that allows compiler-rt to be used as the default runtime library:

With those upstream enhancements, I wrote several ebuilds for Gentoo to construct a GCC-free C++ runtime environment, including:

  • create new packages for LLVM’s libunwind and libc++abi
  • enhance libc++ to support libc++abi and libunwind
  • make llvm compatible with musl-libc
  • enhance clang to support libc++ as the default stdlib and compiler-rt as the default rtlib
  • create a profile for using clang as the default compiler in Gentoo

Code submitted to Gentoo:

Code under reviewing for Gentoo:

Once the pending pull requests are merged, I’ll deliver a proper Gentoo stage3 with clang as the default compiler and all packages (except for kernel) built with clang.



Use clang as a native compiler in Gentoo

So far my GSoC project for Gentoo this year is soon coming to an end. There are still a few missing pieces, but the goal — supporting clang as a native compiler — is almost achieved. There’s nothing preventing you from using clang natively in your system now, and I’ll show you how in this post 🙂

First, you should install a GCC-free C++ runtime stack, composed of libunwind, libcxxrt and libcxx. There’re two versions of libunwind, one from nongnu and the other from LLVM. I encourage you to use the LLVM version, which proved more robust in my previous experiments [1]. To explicitly install LLVM’s libunwind, type:

$ emerge llvm-libunwind

followed by:

$ USE=libunwind emerge libcxx

libcxxrt will be pulled in automatically by emerge since libcxx depends on it. If we don’t explicitly install llvm-libunwind, the nongnu version will be pulled in instead.

Then it’s time to install clang:

$ USE='clang default-libcxx default-compiler-rt' emerge llvm

The USE flags default-libcxx and default-compiler-rt tell clang to use libc++ as the default C++ library and compiler-rt as the default runtime library, in place of libstdc++ and libgcc respectively, thus getting rid of dependence on GCC. If your system is based on musl-libc, you need also add -sanitize to the USE flags to disable LLVM’s sanitizers, which won’t compile on musl at the moment.

Remember to apply keyword ~amd64 on all the packages involved, since the features mentioned above are only available in the latest version of those packages. Additionally, the latest version of LLVM is currently masked for testing; you need to put the following line in package.unmask to unmask it:

# in file /etc/portage/profile/package.unmask

Then we are ready to go! Now you can use clang to compile any C/C++ program and the resulting binary will be GCC-free: no dependence on libgcc or libstdc++. Also put CC=clang and CXX=clang++ to ensure all your future packages are compiled by clang. Actually I encourage you to rebuild @world instantly, after which all packages in your system will be GCC-free:

$ emerge -e @world

NOTE: there’s one package you should pay attention to: libmnl. It’s pulled in by another package iproute2, and unfortunately is mis-compiled by clang [1]. There’re two solutions: 1) apply keyword ~amd64 on libmnl so the latest version is used, which works with clang; 2) apply USE flag minimal on iproute2 so it doesn’t need libmnl at all.

At this moment, you may wonder if we can just uninstall GCC once and for all. Unfortunately, the answer is no. There are two cases where GCC is still irreplaceable. First is when you upgrade your kernel. Currently clang is not capable of compiling the kernel without heavy patching, so you still need the good old GCC.

The second case is when you compile C++ code. When I say the binary built by clang is GCC-free, it’s actually not 100% true 🙁 There’re two pieces from GCC needed by every C++ program: crtbegin and crtend; unfortunately they’re not provided by any library mentioned above. I tried borrowing implementation of these two files from NetBSD, and they seem to work right out of the box. But another problem is that clang on Linux is hardcoded to use GCC’s crtbegin/end; NetBSD’s crtbegin/end won’t be recognized unless clang’s behavior is altered. This issue isn’t unresolvable and I’ll see if I can find a workaround for it. For now, just don’t remove your GCC 🙂

This project is not finished. There’re two pieces to be delivered soon: 1) a new package libcxxabi, which is developed by LLVM, will replace libcxxrt as the default C++ ABI library; 2) a new profile for native clang where USE flags, keywords and masks are all set appropriately so you don’t need to do it yourself.

Stay tuned!



A new Gentoo stage4: musl + clang

I’m glad to announce that I just successfully deployed the GNU-free toolchain, which I’ve been working on so far, into a musl-based Gentoo stage4. Everything in this stage4, except for the kernel, is built by clang with non-GNU runtime libs. I’m now using this as my main Linux system, and luckily nothing breaks so far 🙂

To be more specific, the runtime environments of this stage4 is composed of the following components:

  • C runtime: musl
  • C++ runtime
    • C++ stdlib: libc++
    • C++ ABI lib: libcxxrt [1]
    • stack unwinding lib: libunwind
  • compiler: clang

Let’s take program /usr/bin/ for example, which is written in C++, and see its runtime dependencies:

$ ldd /usr/bin/
	/lib/ (0x559f8b598000) => /lib/ (0x7f5a5b7a9000) => /usr/lib/ (0x7f5a5b6ea000) => /usr/lib/ (0x7f5a5b4cd000) => /usr/lib/ (0x7f5a5b4c4000) => /lib/ (0x559f8b598000)

Apparently none of the above libs is related to gcc. So this program is just free from gcc!

Actually I’ve already figured out how to glue clang and those non-GNU C++ runtime libs together weeks ago; but this is the first time I put it into serious use. The most challenging part is to rebuild @world with this toolchain; I’m really nervous that countless packages get broken after I issue the command emerge -e @world. Surprisingly enough, only two packages are broken.

The first broken package is iproute2, which depends on libmnl and libmnl is mis-compiled by clang. libmnl has an unpleasant past with clang [2], but its new versions are fine. So the fix is simple: just upgrade libmnl to the latest version available (use ~amd64).

The second broken package, ironically, is clang/llvm itself. It turned out the libunwind I’m using lacks a few functions essential to llvm. Actually there’re two versions of libunwind available: one is from nongnu [3] which already exists in Gentoo’s repo for a while; the other one is developed by LLVM and is almost functionally equivalent to the former. Initially I just used the nongnu version, since there’s no package for LLVM’s libunwind yet. Unfortunately, the nongnu one works fine with everything else except for llvm itself. I just had to stop being lazy and wrote an ebuild for LLVM’s libunwind. Luckily again, it has everything we need, including those functions missing in the other libunwind.

After that, while I’m using this new system, I encountered a few other broken packages. But they’re either incompatible with musl, or the gold linker (yes, it’s the /usr/bin/ I just showed you). There’s nothing wrong with clang or the C++ runtime libs.

I also found an interesting fact: the building of gcc involves several steps of bootstrapping and the final executable has no dependence on any external C++ runtime libs, though gcc itself written in C++. This means gcc still works even clang or the C++ runtime gets broken. So I don’t need to worry about breaking packages since I can still rebuild them with gcc 🙂

Now that I have a working stage4, the next step is to make gcc-config support clang, so clang can act like a real native compiler.

Stay tuned!


[1] libcxxrt is functionally equivalent to libc++abi, and is planned to be replaced by the latter later in this project

A few thoughts on libc++ and _GNU_SOURCE

This week I was trying to make libc++ work without _GNU_SOURCE predefined, which causes me some trouble when compiling LLVM against musl. As mentioned in my last post, g++/clang++ unconditionally predefines _GNU_SOURCE for any C++ code, because libstdc++/libc++ simply won’t work without it. This is an old and well-known issue [1], but unfortunately has never been fixed. This week I boldly tried to fix it for libc++, and failed 🙁

Simply put, libc++ depends on some non-standard C functions that are only available when _GNU_SOURCE is predefined. For example, strtoll_l() is a non-POSIX function hidden by _GNU_SOURCE in <stdlib.h>, and used by libc++’s header <locale>. A naive idea might be to define _GNU_SOURCE in <locale>. It doesn’t work because <stdlib.h> is possibly already included and expanded before <locale>, at which point defining _GNU_SOURCE is too late.

To address the above problem, we need to define _GNU_SOURCE before any inclusion of <stdlib.h>. So a straight-forward idea is putting _GNU_SOURCE in <cstdlib>, which is the only place in libc++ where <stdlib.h> is directly included (other C++ headers usually include <cstdlib> instead). Unfortunately this doesn’t work either. If you read glibc’s header, you’ll notice that symbols like strtoll_l are actually not directly protected by _GNU_SOURCE, but by another macro: __USE_GNU. __USE_GNU is defined in <features.h> only when _GNU_SOURCE is defined, so literally they have the same effect. But this leads to an unpleasant consequence: <features.h> might be included prior to <cstdio>, so defining _GNU_SOURCE doesn’t necessarily mean __USE_GNU is defined; without __USE_GNU, the symbols we want in <stdlib.h> are still hidden.

Then here comes the third idea: just define _GNU_SOURCE before any inclusion of <features.h>! Thus we make sure __USE_GNU is properly defined this time. This works in theory; the problem is we don’t know when exactly <features.h> is to be included. Almost every C header implicitly includes <features.h> somewhere, which means we need to define _GNU_SOURCE before the inclusion of any C header in libc++’s headers: <cstdio>, <cstdlib>, <cstring>, etc.

Defining _GNU_SOURCE in <cstdio>, <cstdlib>, etc seems no big deal. Doing that, we don’t need the C++ compiler to predefine _GNU_SOURCE for libc++, and user code won’t be polluted by _GNU_SOURCE anymore. Flawless, isn’t it? In fact, no. Let’s recall what’s the purpose of avoiding _GNU_SOURCE: to prevent user code from being polluted by non-standard symbols. With our “solution”, though _GNU_SOURCE is absent, those symbols hidden by it are still exposed in user code anyway. So this isn’t a “real” solution.

This issue just doesn’t seem as trivial as it appears to be; no wonder it’s never fixed though frequently complained about. A large part of the nastiness is due to the abuse of feature test macros in libc; perhaps when C++ module become a real deal [2], C++ library writers won’t be bothered by macro pollutions anymore.


The invisible _GNU_SOURCE in your C++ code

Recently two of my patches made their way into clang/LLVM; now ‘musl’ is a valid environment type in LLVM, and you can configure clang to build binaries against musl on Linux, without using fancy compiler flags like those shown in my previous blog posts. The “natural” next step of this project is to build LLVM itself against musl, and this task turns out to be tougher than I expected. Let’s now dive into the technical part.

Briefly speaking, there’s a chunk of code like the following in LLVM:

namespace LibFunc {
enum Func {

which defines a set of enumerators with the same names as various libc functions. This is totally valid since these enumerators are protected by a C++ namespace, so *ideally* won’t clash with raw libc function names.

But the story goes a bit differently on musl’s side. Some of the functions, including fopen64 listed in the code snippet, is actually non-POSIX, and somehow musl decides to define them as aliases to the non-64 versions. Here’s a code snippet from musl’s header stdio.h:

#if defined(_LARGEFILE64_SOURCE) || defined(_GNU_SOURCE)
#define tmpfile64 tmpfile
#define fopen64 fopen
#define freopen64 freopen

You can see these 64-suffixed functions are defined as macros; the ugliness of macros is that they don’t respect C++’s scoping rules, thus the inevitable name clashing with LLVM.

Should we blame musl for this? Actually it does protect these symbols with another macro _GNU_SOURCE; the fopen64 and stuff are exposed only when _GNU_SOURCE is defined. OTOH, I checked LLVM’s code and in fact it never explicitly defines _GNU_SOURCE, unless glibc is in use. Then why the clash? Unfortunately the ugliness doesn’t just end here. It turns out g++/clang++ unconditionally predefines _GNU_SOURCE when compiling any C++ code on Linux, and that’s because libc++/libstd++ won’t work on Linux without this macro defined. So the perfect fix for this incompatibility between LLVM and musl should be fixing the C++ compiler itself, but that’s another big story…

Build GNU-free executables with clang

Previously we discussed how to build a LLVM C++ runtime stack with libc++, libc++abi and libunwind. Along with musl, we now have a GNU-free C/C++ runtime environment. But an unfortunate fact is, clang is used to living with GCC and glibc, and it takes some extra effort to make clang work with our new environment. In my last post, I demonstrated how to make a wrapper of clang to build GNU-free executables. As the name of this project implies, we want a native clang, not an ugly wrapper. So in this post, I’m going to show you how to build a native clang that works “out of the box”.

Before getting into it, let’s first analyze what dependencies a program built by clang typically has. For C programs, it’s of course glibc; and for C++ programs there’s also libstdc++. Yet there’s a lesser known library that every program relies on: libgcc. Sometimes it’s statically linked into the executable, other times dynamically linked in the form of “libgcc_s”. libgcc is a low-level runtime library provided by GCC. More information can be found at: LLVM also has a replacement for it: compiler-rt; and we’ll need this later.

The biggest obstacle preventing clang from working with musl is that musl has its own dynamic linker which could not be recognized by clang. A naive workaround is to rename musl’s linker to the same name as glibc’s, but that would obviously mess up the whole system. We’ll have to take a alternative approach (which I’m personally resistant to): modify clang/LLVM’s source code.

Two rudimentary patches that work on x86_64 platforms could be found here: As their names imply, one patch is for the LLVM source root, and the other for clang. Assume you’ve already checked out LLVM, clang and compiler-rt to the right location, say $LLVM, $LLVM/tools/clang and $LLVM/projects/compiler-rt respectively. After applying the patches, issue the following command to build them all together:

$ mkdir $LLVM/build && cd $LLVM/build
$ cmake -DGCC_INSTALL_PREFIX=/usr \
-DDEFAULT_SYSROOT=/usr/x86_64-pc-linux-musl \
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-pc-linux-musl ..
$ make

Some explanations:
DEFAULT_SYSROOT tells clang where to find musl’s headers and libraries. It’s pointed to the location where the musl toolchain is installed.
GCC_INSTALL_PREFIX specifies where GCC is installed. clang needs this to find crtbegin.o and crtend.o. This part is a bit thorny, as neither musl or clang provides these files. We’ll need to replace them with some other vendor’s later in this project.
CLANG_DEFAULT_CXX_STDLIB tells clang to use libc++ by default. A vanilla clang on Linux always uses libstdc++ by default.
LLVM_DEFAULT_TARGET_TRIPLE informs clang that we’re targeting on musl-libc; without this clang won’t find the correct dynamic linker.

After putting the freestanding C++ runtime libraries we previously built under /usr/x86_64-pc-linux-musl/usr/lib, we should have a native clang that “almost” works out of the box. Why “almost”? Because we still need to feed one option to clang: “-rtlib=compiler-rt”, indicating the use of compiler-rt instead of libgcc. I’m still struggling to set this option permanently at build time; hopefully I don’t have to modify too much of clang’s code to achieve this…

Now, let’s take a final look of our product:

$ ./bin/clang++ -rtlib=compiler-rt
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: []
0x0000000000000001 (NEEDED) Shared library: []
0x0000000000000001 (NEEDED) Shared library: []


Build a GNU-free C++ program on Gentoo

This post is a following-up of my previous one: Build a freestanding libc++. Here I’ll demonstrate how to link a C++ program with the freestanding libc++ we just built. The resulting executable will have no dependence on glibc or any GCC component, thus is GNU-free.

As the libc++ in use is linked with musl, the C++ program to be linked with libc++ need also be linked with musl. In the previous post we already got a musl-based toolchain via crossdev, but its C++ compiler is g++, which unfortunately doesn’t support libc++. We’ll have to use a vanilla clang++ instead to compile our C++ program then.

The problem with using a vanilla clang++ is that it’s not musl-aware, i.e. it doesn’t know where to find musl’s headers and libraries. Such information can only be fed to clang++ via some cumbersome command-line arguments. Assume our C++ runtime libraries (including libc++) built in previous post are installed under directory $LOCAL; the complete command for compiling a GNU-free C++ program looks like this:

clang++ \
-nostdinc -isystem /usr/x86_64-pc-linux-musl/usr/include \
-I $LOCAL/include/c++/v1 \
-L /usr/x86_64-pc-linux-musl/usr/lib -L $LOCAL/lib \
-nostartfiles /usr/x86_64-pc-linux-musl/usr/lib/crt1.o \
-Wl,-dynamic-linker,/usr/x86_64-pc-linux-musl/lib/ \
-Wl,-rpath,/usr/x86_64-pc-linux-musl/usr/lib,-rpath,$LOCAL/lib \
-nodefaultlibs -stdlib=libc++ -lc -lc++

This is indeed a long command. Because clang doesn’t have something like GCC’s specs file, we have to elaborately specify every configuration on the command line. This is also what I’m going to improve this summer. Hopefully we’ll make clang more friendly to musl, so we don’t have to issue such obscure commands when clang is eventually deployed as Gentoo’s default compiler.

Now let me explain these command options in some detail:

  • nostdinc tells clang not to include the standard headers, following by two arguments specifying location of C and C++ headers, then another two specifying location of shared libraries. This is necessary because clang is not configured with musl and thus has no way to find the correct headers and libs.
  • nostartfiles tells clang to link the program with musl’s version of start files instead of glibc’s.
  • dynamic-linker and –rpath are options passed to the linker. They tell the program what dynamic linker to use and where to find necessary dynamic libraries at runtime.
  • nodefaultlibs, like -stdinc, tells clang to link with musl and the C++ runtime libraries we specify, instead of the default ones.

For convenience, I put this really long command in a shell script named musl-clang++ and replace “” with “$@” for general use. Let’s see if it works:

$ musl-clang++ -o hello
$ readelf -d hello | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: []
0x0000000000000001 (NEEDED) Shared library: []
0x0000000000000001 (NEEDED) Shared library: []
$ ./hello


Build a freestanding libc++

libc++ is the C++ standard library implemented by LLVM and an essential part of the clang-based toolchain we’re going to build. In this post I’ll demonstrate how to build a freestanding libc++ on Gentoo.

A complete C++ runtime stack consists of three components, from top to bottom:

  • a C++ standard library
  • a C++ ABI library
  • a stack unwinding library

As stated in my introductory post, in this GSoC project these roles will be taken by libc++, libc++abi and libunwind, respectively. As higher-level libraries depend on lower-level ones, we need to build them from bottom to top, i.e. :

  1. build libunwind
  2. build libc++abi against libunwind
  3. build libc++ against libc++abi

All of the above, of course, should be linked with musl instead of glibc. So first of all, we need a proper toolchain that can link binaries with musl. Luckily, Gentoo’s developers already prepared such a toolchain for us; just type the following commands:

$ emerge layman crossdev && layman -a musl && \
crossdev -t x86_64-pc-linux-musl

Here we use layman to create a layout for musl, and then use crossdev to auto-magically build the toolchain. Check out all needed respositories and we are ready to build the libraries:

$ cd $REPOS
$ svn co llvm
$ svn co libunwind
$ svn co libcxxabi
$ svn co libcxx

Build libunwind:

$ cd $REPOS/libunwind && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DLLVM_PATH="$REPOS/llvm" ..
$ make

Note: I want to statically link libunwind into libc++abi, so I disable the building of shared library through LIBUNWIND_ENABLE_SHARED. You may safely omit this option if you want a shared version.

Build libc++abi:

$ cd $REPOS/libc++abi && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DLLVM_PATH="$REPOS/llvm" ..
$ make

Build libc++:

$ cd $REPOS/libc++ && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DLIBCXX_CXX_ABI=libcxxabi \
-DLIBCXX_CXX_ABI_LIBRARY_PATH="$REPOS/libcxxabi/build/lib" \
$ make

Note: libgcc is GCC’s stack unwinding library and should not be used in our C++ runtime stack, so I explicitly disable it through LIBCXX_HAS_GCC_S_LIB ; otherwise it’ll sneak into our library.

Now the C++ runtime stack is complete; it’s time to verify our work:

$ readelf -d $REPOS/libcxx/build/lib/ | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: []
0x0000000000000001 (NEEDED) Shared library: []

libunwind is statically linked so is not shown. Dependencies on libc++abi and libc (musl) look correct. So far, so good 🙂

In the following post, I’ll demonstrate how to link an actual C++ program with this freshly built libc++. See you!

Hello GSoC 2016 !

I’m very glad that I’m accepted by Gentoo as a participant of Google Summer of Code this year. During this summer, I’ll be working on building a clang-based toolchain for Gentoo.

Clang is a modern C/C++ compiler developed by LLVM, famous for its modular design and non-intrusive license. The ideal result of this project is to provide a Gentoo profile, where clang is the default compiler in place of gcc.

As clang is written in C++, it needs a C++ runtime to work, which are basically a C standard library, a C++ standard library, a C++ ABI library and a stack unwinder. On a typical Linux host, glibc and libstdc++ are the de facto C and C++ standard libraries respectively. The functionality of C++ ABI library is also integrated in libstdc++; the stack unwinder is implemented in libgcc.

libstdc++ and libgcc are both parts of GCC, which won’t be available when we deploy clang as the default compiler. Luckily, besides clang, LLVM also developed a complete implementation of the C++ runtime, consisting of three libraries: libc++, libc++abi and libunwind. Unlike GCC, the C++ ABI library is implemented separately. To decouple our toolchain further from the GNU toolset, we’ll use musl as the libc.

Sum it up

In this project, we’ll build a toolchain with clang as the compiler, musl as libc and a C++ runtime composed of libc++, libc++abi and libunwind. If everything goes smoothly, this setup will be offered as a Gentoo profile; users who like the neat features of clang thus have the chance to say goodbye to GCC 🙂

I’ll update this blog regularly to reflect my most recent progress and share technical stuff that might be helpful to others. Stay tuned !