A few thoughts on libc++ and _GNU_SOURCE

This week I was trying to make libc++ work without _GNU_SOURCE predefined, which causes me some trouble when compiling LLVM against musl. As mentioned in my last post, g++/clang++ unconditionally predefines _GNU_SOURCE for any C++ code, because libstdc++/libc++ simply won’t work without it. This is an old and well-known issue [1], but unfortunately has never been fixed. This week I boldly tried to fix it for libc++, and failed 🙁

Simply put, libc++ depends on some non-standard C functions that are only available when _GNU_SOURCE is predefined. For example, strtoll_l() is a non-POSIX function hidden by _GNU_SOURCE in <stdlib.h>, and used by libc++’s header <locale>. A naive idea might be to define _GNU_SOURCE in <locale>. It doesn’t work because <stdlib.h> is possibly already included and expanded before <locale>, at which point defining _GNU_SOURCE is too late.

To address the above problem, we need to define _GNU_SOURCE before any inclusion of <stdlib.h>. So a straight-forward idea is putting _GNU_SOURCE in <cstdlib>, which is the only place in libc++ where <stdlib.h> is directly included (other C++ headers usually include <cstdlib> instead). Unfortunately this doesn’t work either. If you read glibc’s header, you’ll notice that symbols like strtoll_l are actually not directly protected by _GNU_SOURCE, but by another macro: __USE_GNU. __USE_GNU is defined in <features.h> only when _GNU_SOURCE is defined, so literally they have the same effect. But this leads to an unpleasant consequence: <features.h> might be included prior to <cstdio>, so defining _GNU_SOURCE doesn’t necessarily mean __USE_GNU is defined; without __USE_GNU, the symbols we want in <stdlib.h> are still hidden.

Then here comes the third idea: just define _GNU_SOURCE before any inclusion of <features.h>! Thus we make sure __USE_GNU is properly defined this time. This works in theory; the problem is we don’t know when exactly <features.h> is to be included. Almost every C header implicitly includes <features.h> somewhere, which means we need to define _GNU_SOURCE before the inclusion of any C header in libc++’s headers: <cstdio>, <cstdlib>, <cstring>, etc.

Defining _GNU_SOURCE in <cstdio>, <cstdlib>, etc seems no big deal. Doing that, we don’t need the C++ compiler to predefine _GNU_SOURCE for libc++, and user code won’t be polluted by _GNU_SOURCE anymore. Flawless, isn’t it? In fact, no. Let’s recall what’s the purpose of avoiding _GNU_SOURCE: to prevent user code from being polluted by non-standard symbols. With our “solution”, though _GNU_SOURCE is absent, those symbols hidden by it are still exposed in user code anyway. So this isn’t a “real” solution.

This issue just doesn’t seem as trivial as it appears to be; no wonder it’s never fixed though frequently complained about. A large part of the nastiness is due to the abuse of feature test macros in libc; perhaps when C++ module become a real deal [2], C++ library writers won’t be bothered by macro pollutions anymore.

[1] http://web.mit.edu/darwin/src/modules/gcc3/libstdc++-v3/docs/html/faq/#3_5
[2] http://clang.llvm.org/docs/Modules.html

The invisible _GNU_SOURCE in your C++ code

Recently two of my patches made their way into clang/LLVM; now ‘musl’ is a valid environment type in LLVM, and you can configure clang to build binaries against musl on Linux, without using fancy compiler flags like those shown in my previous blog posts. The “natural” next step of this project is to build LLVM itself against musl, and this task turns out to be tougher than I expected. Let’s now dive into the technical part.

Briefly speaking, there’s a chunk of code like the following in LLVM:

namespace LibFunc {
enum Func {
    ...
    fopen,
    fopen64,
    fprintf,
    fputc,
    ...
};
}

which defines a set of enumerators with the same names as various libc functions. This is totally valid since these enumerators are protected by a C++ namespace, so *ideally* won’t clash with raw libc function names.

But the story goes a bit differently on musl’s side. Some of the functions, including fopen64 listed in the code snippet, is actually non-POSIX, and somehow musl decides to define them as aliases to the non-64 versions. Here’s a code snippet from musl’s header stdio.h:

#if defined(_LARGEFILE64_SOURCE) || defined(_GNU_SOURCE)
#define tmpfile64 tmpfile
#define fopen64 fopen
#define freopen64 freopen
...
#endif

You can see these 64-suffixed functions are defined as macros; the ugliness of macros is that they don’t respect C++’s scoping rules, thus the inevitable name clashing with LLVM.

Should we blame musl for this? Actually it does protect these symbols with another macro _GNU_SOURCE; the fopen64 and stuff are exposed only when _GNU_SOURCE is defined. OTOH, I checked LLVM’s code and in fact it never explicitly defines _GNU_SOURCE, unless glibc is in use. Then why the clash? Unfortunately the ugliness doesn’t just end here. It turns out g++/clang++ unconditionally predefines _GNU_SOURCE when compiling any C++ code on Linux, and that’s because libc++/libstd++ won’t work on Linux without this macro defined. So the perfect fix for this incompatibility between LLVM and musl should be fixing the C++ compiler itself, but that’s another big story…

Build GNU-free executables with clang

Previously we discussed how to build a LLVM C++ runtime stack with libc++, libc++abi and libunwind. Along with musl, we now have a GNU-free C/C++ runtime environment. But an unfortunate fact is, clang is used to living with GCC and glibc, and it takes some extra effort to make clang work with our new environment. In my last post, I demonstrated how to make a wrapper of clang to build GNU-free executables. As the name of this project implies, we want a native clang, not an ugly wrapper. So in this post, I’m going to show you how to build a native clang that works “out of the box”.

Before getting into it, let’s first analyze what dependencies a program built by clang typically has. For C programs, it’s of course glibc; and for C++ programs there’s also libstdc++. Yet there’s a lesser known library that every program relies on: libgcc. Sometimes it’s statically linked into the executable, other times dynamically linked in the form of “libgcc_s”. libgcc is a low-level runtime library provided by GCC. More information can be found at: https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html. LLVM also has a replacement for it: compiler-rt; and we’ll need this later.

The biggest obstacle preventing clang from working with musl is that musl has its own dynamic linker which could not be recognized by clang. A naive workaround is to rename musl’s linker to the same name as glibc’s, but that would obviously mess up the whole system. We’ll have to take a alternative approach (which I’m personally resistant to): modify clang/LLVM’s source code.

Two rudimentary patches that work on x86_64 platforms could be found here: https://github.com/zzlei/musl-clang. As their names imply, one patch is for the LLVM source root, and the other for clang. Assume you’ve already checked out LLVM, clang and compiler-rt to the right location, say $LLVM, $LLVM/tools/clang and $LLVM/projects/compiler-rt respectively. After applying the patches, issue the following command to build them all together:

$ mkdir $LLVM/build && cd $LLVM/build
$ cmake -DGCC_INSTALL_PREFIX=/usr \
-DDEFAULT_SYSROOT=/usr/x86_64-pc-linux-musl \
-DCLANG_DEFAULT_CXX_STDLIB=libc++ \
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-pc-linux-musl ..
$ make

Some explanations:
DEFAULT_SYSROOT tells clang where to find musl’s headers and libraries. It’s pointed to the location where the musl toolchain is installed.
GCC_INSTALL_PREFIX specifies where GCC is installed. clang needs this to find crtbegin.o and crtend.o. This part is a bit thorny, as neither musl or clang provides these files. We’ll need to replace them with some other vendor’s later in this project.
CLANG_DEFAULT_CXX_STDLIB tells clang to use libc++ by default. A vanilla clang on Linux always uses libstdc++ by default.
LLVM_DEFAULT_TARGET_TRIPLE informs clang that we’re targeting on musl-libc; without this clang won’t find the correct dynamic linker.

After putting the freestanding C++ runtime libraries we previously built under /usr/x86_64-pc-linux-musl/usr/lib, we should have a native clang that “almost” works out of the box. Why “almost”? Because we still need to feed one option to clang: “-rtlib=compiler-rt”, indicating the use of compiler-rt instead of libgcc. I’m still struggling to set this option permanently at build time; hopefully I don’t have to modify too much of clang’s code to achieve this…

Now, let’s take a final look of our product:

$ ./bin/clang++ hello.cc -rtlib=compiler-rt
$ readelf -d a.out | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc++.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so]

Great!

Build a GNU-free C++ program on Gentoo

This post is a following-up of my previous one: Build a freestanding libc++. Here I’ll demonstrate how to link a C++ program with the freestanding libc++ we just built. The resulting executable will have no dependence on glibc or any GCC component, thus is GNU-free.

As the libc++ in use is linked with musl, the C++ program to be linked with libc++ need also be linked with musl. In the previous post we already got a musl-based toolchain via crossdev, but its C++ compiler is g++, which unfortunately doesn’t support libc++. We’ll have to use a vanilla clang++ instead to compile our C++ program then.

The problem with using a vanilla clang++ is that it’s not musl-aware, i.e. it doesn’t know where to find musl’s headers and libraries. Such information can only be fed to clang++ via some cumbersome command-line arguments. Assume our C++ runtime libraries (including libc++) built in previous post are installed under directory $LOCAL; the complete command for compiling a GNU-free C++ program looks like this:

clang++ hello.cc \
-nostdinc -isystem /usr/x86_64-pc-linux-musl/usr/include \
-I $LOCAL/include/c++/v1 \
-L /usr/x86_64-pc-linux-musl/usr/lib -L $LOCAL/lib \
-nostartfiles /usr/x86_64-pc-linux-musl/usr/lib/crt1.o \
-Wl,-dynamic-linker,/usr/x86_64-pc-linux-musl/lib/ld-musl-x86_64.so.1 \
-Wl,-rpath,/usr/x86_64-pc-linux-musl/usr/lib,-rpath,$LOCAL/lib \
-nodefaultlibs -stdlib=libc++ -lc -lc++

This is indeed a long command. Because clang doesn’t have something like GCC’s specs file, we have to elaborately specify every configuration on the command line. This is also what I’m going to improve this summer. Hopefully we’ll make clang more friendly to musl, so we don’t have to issue such obscure commands when clang is eventually deployed as Gentoo’s default compiler.

Now let me explain these command options in some detail:

  • nostdinc tells clang not to include the standard headers, following by two arguments specifying location of C and C++ headers, then another two specifying location of shared libraries. This is necessary because clang is not configured with musl and thus has no way to find the correct headers and libs.
  • nostartfiles tells clang to link the program with musl’s version of start files instead of glibc’s.
  • dynamic-linker and –rpath are options passed to the linker. They tell the program what dynamic linker to use and where to find necessary dynamic libraries at runtime.
  • nodefaultlibs, like -stdinc, tells clang to link with musl and the C++ runtime libraries we specify, instead of the default ones.

For convenience, I put this really long command in a shell script named musl-clang++ and replace “hello.cc” with “$@” for general use. Let’s see if it works:

$ musl-clang++ hello.cc -o hello
$ readelf -d hello | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc.so]
0x0000000000000001 (NEEDED) Shared library: [libc++.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1]
$ ./hello
Hello

Yay!

Build a freestanding libc++

libc++ is the C++ standard library implemented by LLVM and an essential part of the clang-based toolchain we’re going to build. In this post I’ll demonstrate how to build a freestanding libc++ on Gentoo.

A complete C++ runtime stack consists of three components, from top to bottom:

  • a C++ standard library
  • a C++ ABI library
  • a stack unwinding library

As stated in my introductory post, in this GSoC project these roles will be taken by libc++, libc++abi and libunwind, respectively. As higher-level libraries depend on lower-level ones, we need to build them from bottom to top, i.e. :

  1. build libunwind
  2. build libc++abi against libunwind
  3. build libc++ against libc++abi

All of the above, of course, should be linked with musl instead of glibc. So first of all, we need a proper toolchain that can link binaries with musl. Luckily, Gentoo’s developers already prepared such a toolchain for us; just type the following commands:

$ emerge layman crossdev && layman -a musl && \
crossdev -t x86_64-pc-linux-musl

Here we use layman to create a layout for musl, and then use crossdev to auto-magically build the toolchain. Check out all needed respositories and we are ready to build the libraries:

$ cd $REPOS
$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
$ svn co http://llvm.org/svn/llvm-project/libunwind/trunk libunwind
$ svn co http://llvm.org/svn/llvm-project/libcxxabi/trunk libcxxabi
$ svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx

Build libunwind:

$ cd $REPOS/libunwind && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DLIBUNWIND_ENABLE_SHARED=0 \
-DLLVM_PATH="$REPOS/llvm" ..
$ make

Note: I want to statically link libunwind into libc++abi, so I disable the building of shared library through LIBUNWIND_ENABLE_SHARED. You may safely omit this option if you want a shared version.

Build libc++abi:

$ cd $REPOS/libc++abi && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DCMAKE_SHARED_LINKER_FLAGS="-L$REPOS/libunwind/build/lib"
-DLIBCXXABI_USE_LLVM_UNWINDER=1 \
-DLIBCXXABI_LIBUNWIND_PATH="$REPOS/libunwind" \
-DLIBCXXABI_LIBCXX_INCLUDES="$REPOS/libcxx/include" \
-DLLVM_PATH="$REPOS/llvm" ..
$ make

Build libc++:

$ cd $REPOS/libc++ && mkdir build && cd build
$ cmake -DCMAKE_C_COMPILER=x86_64-pc-linux-musl-gcc \
-DCMAKE_CXX_COMPILER=x86_64-pc-linux-musl-g++ \
-DLIBCXX_HAS_MUSL_LIBC=1 \
-DLIBCXX_HAS_GCC_S_LIB=0 \
-DLIBCXX_CXX_ABI=libcxxabi \
-DLIBCXX_CXX_ABI_INCLUDE_PATHS="$REPOS/libcxxabi/include" \
-DLIBCXX_CXX_ABI_LIBRARY_PATH="$REPOS/libcxxabi/build/lib" \
-DLLVM_PATH="$REPOS/llvm" \
$ make

Note: libgcc is GCC’s stack unwinding library and should not be used in our C++ runtime stack, so I explicitly disable it through LIBCXX_HAS_GCC_S_LIB ; otherwise it’ll sneak into our library.

Now the C++ runtime stack is complete; it’s time to verify our work:

$ readelf -d $REPOS/libcxx/build/lib/libc++.so.1 | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libc++abi.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so]

libunwind is statically linked so is not shown. Dependencies on libc++abi and libc (musl) look correct. So far, so good 🙂

In the following post, I’ll demonstrate how to link an actual C++ program with this freshly built libc++. See you!