A few thoughts on libc++ and _GNU_SOURCE

This week I was trying to make libc++ work without _GNU_SOURCE predefined, which causes me some trouble when compiling LLVM against musl. As mentioned in my last post, g++/clang++ unconditionally predefines _GNU_SOURCE for any C++ code, because libstdc++/libc++ simply won’t work without it. This is an old and well-known issue [1], but unfortunately has never been fixed. This week I boldly tried to fix it for libc++, and failed 🙁

Simply put, libc++ depends on some non-standard C functions that are only available when _GNU_SOURCE is predefined. For example, strtoll_l() is a non-POSIX function hidden by _GNU_SOURCE in <stdlib.h>, and used by libc++’s header <locale>. A naive idea might be to define _GNU_SOURCE in <locale>. It doesn’t work because <stdlib.h> is possibly already included and expanded before <locale>, at which point defining _GNU_SOURCE is too late.

To address the above problem, we need to define _GNU_SOURCE before any inclusion of <stdlib.h>. So a straight-forward idea is putting _GNU_SOURCE in <cstdlib>, which is the only place in libc++ where <stdlib.h> is directly included (other C++ headers usually include <cstdlib> instead). Unfortunately this doesn’t work either. If you read glibc’s header, you’ll notice that symbols like strtoll_l are actually not directly protected by _GNU_SOURCE, but by another macro: __USE_GNU. __USE_GNU is defined in <features.h> only when _GNU_SOURCE is defined, so literally they have the same effect. But this leads to an unpleasant consequence: <features.h> might be included prior to <cstdio>, so defining _GNU_SOURCE doesn’t necessarily mean __USE_GNU is defined; without __USE_GNU, the symbols we want in <stdlib.h> are still hidden.

Then here comes the third idea: just define _GNU_SOURCE before any inclusion of <features.h>! Thus we make sure __USE_GNU is properly defined this time. This works in theory; the problem is we don’t know when exactly <features.h> is to be included. Almost every C header implicitly includes <features.h> somewhere, which means we need to define _GNU_SOURCE before the inclusion of any C header in libc++’s headers: <cstdio>, <cstdlib>, <cstring>, etc.

Defining _GNU_SOURCE in <cstdio>, <cstdlib>, etc seems no big deal. Doing that, we don’t need the C++ compiler to predefine _GNU_SOURCE for libc++, and user code won’t be polluted by _GNU_SOURCE anymore. Flawless, isn’t it? In fact, no. Let’s recall what’s the purpose of avoiding _GNU_SOURCE: to prevent user code from being polluted by non-standard symbols. With our “solution”, though _GNU_SOURCE is absent, those symbols hidden by it are still exposed in user code anyway. So this isn’t a “real” solution.

This issue just doesn’t seem as trivial as it appears to be; no wonder it’s never fixed though frequently complained about. A large part of the nastiness is due to the abuse of feature test macros in libc; perhaps when C++ module become a real deal [2], C++ library writers won’t be bothered by macro pollutions anymore.

[1] http://web.mit.edu/darwin/src/modules/gcc3/libstdc++-v3/docs/html/faq/#3_5
[2] http://clang.llvm.org/docs/Modules.html

The invisible _GNU_SOURCE in your C++ code

Recently two of my patches made their way into clang/LLVM; now ‘musl’ is a valid environment type in LLVM, and you can configure clang to build binaries against musl on Linux, without using fancy compiler flags like those shown in my previous blog posts. The “natural” next step of this project is to build LLVM itself against musl, and this task turns out to be tougher than I expected. Let’s now dive into the technical part.

Briefly speaking, there’s a chunk of code like the following in LLVM:

namespace LibFunc {
enum Func {
    ...
    fopen,
    fopen64,
    fprintf,
    fputc,
    ...
};
}

which defines a set of enumerators with the same names as various libc functions. This is totally valid since these enumerators are protected by a C++ namespace, so *ideally* won’t clash with raw libc function names.

But the story goes a bit differently on musl’s side. Some of the functions, including fopen64 listed in the code snippet, is actually non-POSIX, and somehow musl decides to define them as aliases to the non-64 versions. Here’s a code snippet from musl’s header stdio.h:

#if defined(_LARGEFILE64_SOURCE) || defined(_GNU_SOURCE)
#define tmpfile64 tmpfile
#define fopen64 fopen
#define freopen64 freopen
...
#endif

You can see these 64-suffixed functions are defined as macros; the ugliness of macros is that they don’t respect C++’s scoping rules, thus the inevitable name clashing with LLVM.

Should we blame musl for this? Actually it does protect these symbols with another macro _GNU_SOURCE; the fopen64 and stuff are exposed only when _GNU_SOURCE is defined. OTOH, I checked LLVM’s code and in fact it never explicitly defines _GNU_SOURCE, unless glibc is in use. Then why the clash? Unfortunately the ugliness doesn’t just end here. It turns out g++/clang++ unconditionally predefines _GNU_SOURCE when compiling any C++ code on Linux, and that’s because libc++/libstd++ won’t work on Linux without this macro defined. So the perfect fix for this incompatibility between LLVM and musl should be fixing the C++ compiler itself, but that’s another big story…