The “Gentoo Reference System” suite: a new Release Engineering tool.

The other day I installed Ubuntu 15.04 on one of my boxes.  I just needed something where I could throw in a DVD, hit install and be done.  I didn’t care about customization or choice, I just needed a working Linux system from which I could chroot work.  Thousands of people around the world install Ubuntu this way and when they’re done, they have a stock system like any other Ubuntu installation, all identical like frames in a Andy Warhol lithograph.   Replication as a form of art.

In contrast, when I install a Gentoo system, I enjoy the anxiety of choice.  Should I use syslog-ng, metalog, or skip a system logger altogether?  If I choose syslog-ng, then I have a choice of 14 USE flags for 2^14 possible configurations for just that package.  And that’s just one of some 850+ packages that are going to make up my desktop.  In contrast to Ubuntu where every installation is identical (whatever “idem” means in this context), the shear space of possibilities make no two Gentoo systems the same unless there is some concerted effort to make them so.  In fact, Gentoo doesn’t even have a notion of a “stock” system unless you count the stage3s which are really bare bones.  There is no “stock” Gentoo desktop.

With the work I am doing with uClibc and musl, I needed a release tool that would build identical desktops repeatedly and predictably where all the choices of packages and USE flags were layed out a priori in some specifications.  I considered catalyst stage4, but catalyst didn’t provide the flexibility I wanted.  I initially wrote some bash scripts to build an XFCE4 desktop from uClibc stage3 tarballs (what I dubbed “Lilblue Linux“), but this was very much ad hoc code and I needed something that could be generalized so I could do the same for a musl-based desktop, or indeed any Gentoo system I could dream up.

This led me to formulate the notion of what I call a “Gentoo Reference System” or GRS for short — maybe we could make stock Gentoo systems available.  The idea here is that one should be able to define some specs for a particular Gentoo system that will unambiguously define all the choices that go into building that system.  Then all instances built according to those particular GRS specs would be identical in much the same way that all Ubuntu systems are the same.  In a Warholian turn, the artistic choices in designing the system would be pushed back into the specs and become part of the automation.  You draw one frame of the lithograph and you magically have a million.

The idea of these systems being “references” was also important for my work because, with uClibc or musl, there’s a lot of package breakages — remember you pushing up against actual implementations of C functions and nearly everything in your systems written in C.  So, in the space of all possible Gentoo systems, I needed some reference points that worked.  I needed that magical combinations of flags and packages that would build and yield useful systems.  It was also important that these references be easily kept working over time since Gentoo systems evolve as the main tree, or overlays, are modified.  Since on some successive build something might break, I needed to quickly identify the delta and address it.  The metaphor that came up in my head from my physics background is that of phase space.  In the swirling mass of evolving dynamical systems, I pictured these “Gentoo Reference Systems” as markers etching out a well defined path over time.

Enough with the metaphors, how does GRS work?  There are two main utilities, grsrun and grsup.  The first is run on a build machine and generates the GRS release as well as any extra packages and updates.  These are delivered as binpkgs.  In contrast, grsup is run on an installed GRS instance and its used for package management.  Since we’re working in a world of identical systems, grsup prefers working with binpkgs that are downloaded from some build machine, but it can revert to building locally as well.

The GRS specs for some system are found on a branch of a git repository.  Currently the repo at https://gitweb.gentoo.org/proj/grs.git/ has four branches, each for one of the four GRS specs housed there.  grsrun is then directed to sync the remote repo locally, check out the branch of the GRS system we want to build and begin reading a script file called build which directs grsrun on what steps to take.  The scripting language is very simple and contains only a handful of different directives.  After a stage tarball is unpacked, build can direct grsrun to do any of the following:

mount and umount – Do a bind mount of /dev/, /dev/pts/ and other directories that are required to get a chroot ready.

populate – Selectively copy files from the local repo to the chroot.  Any files can be copied in, so, for example, you can prepare a pristine home directory for some user with a pre-configured desktop.  Or you can add customized configuration files to /etc for services you plan to run.

runscript – This will run some bash or python script in the chroots.  The scripts are copied from the local repo to /tmp of the chroot and executed there.  These scripts can be like the ones that catalyst runs during stage1/2/3 but can also be scripts to add users and groups, to add services to runlevels, etc.  Think of anything you would do when growing a stage3 into the system you want, script it up and GRS will automated it for you.

kernel – This looks for a kernel config file in the local repo, parses it for the version, builds it and both bundles it as a packages called linux-image-<version>.tar.xz for later distribution as well as installs it into the chroot.  grsup knows how to work with these linux-image-<version>.tar.xz files and can treat them like binpkgs.

tarit and hashit – These directives create a release tarball of the entire chroot and generate the digests.

pivot – If you built a chroot within a chroot, like catalyst does during stage1, then this pivots the inner chroot out so that further building can make use of it.

From an implementation point of view, the GRS suite is written in python and each of the above directives is backed by a simple python class.  Its easy, for instance, to implement more directives this way.  E.g. if you want to build a bootable CD image, you can include a directive called isoit, write a python class for what’s required to construct the iso image and glue this new class into the grs module.

If you’re familiar with catalyst, at this point you might be wondering what’s the difference?  Can’t you do all of this with catalyst?  There is a lot of overlap, but the emphasis is different.  For example, I wanted to be able to drop in a pre-configured desktop for a user.  How would I do that with catalyst?  I guess I could create an overlay with packages for some pre-built home directory but that’s a perversion of what ebuilds are for — we should never be installing into /home.  Rather with grsrun I can just populate the chroot with whatever files I like anywhere in the filesystem.  More importantly, I want to be able control what USE flags are set and, in general, manage all of /etc/portage/catalyst does provide portage_configdir which populates /etc/portage when building stages, but its pretty static.  Instead, grsup and two other utilities, install-worldconf and clean-worldconf, can dynamically manage files under /etc/portage/ according to a configuration file called world.conf.

Lapsing back into metaphor, I see catalyst as rigid and frozen whereas grsrun is loose and fluid.  You can use grsrun to build stage1/2/3 tarballs which are identical to those built with catalyst, and in fact I’ve done so for hardened amd64 mutlilib stages so I could compare.  But with grsrun you have too much freedom in writing the scripts and file that go into the GRS specs and chances are you’ll get something wrong, whereas with catalyst the build is pretty regimented and you’re guaranteed to get uniformity across arches and profiles.  So while you can do the same things with each tool, its not recommended that you use grsrun to do catalyst stage builds — there’s too much freedom.  Whereas when building desktops or servers you might welcome that freedom.

Finally, let me close with how grsup works.  As mentioned above, the GRS specs for some system include a file called world.conf.  Its in configparser format and it specifies files and their contents in the /etc/portage/ directory.  An example section in the file looks like:

[app-crypt/gpgme:1]
package.use : app-crypt/gpgme:1 -common-lisp static-libs
package.env : app-crypt/gpgme:1 app-crypt_gpgme_1
env : LDFLAGS=-largp

This says, for package app-crypt/gpgme:1, drop a file called app-crypt_gpgme_1 in /etc/portage/package.use/ that contains the line “app-crypt/gpgme:1 -common-lisp static-libs”, drop another file by the same name in /etc/portage/package.env/ with line “app-crypt/gpgme:1 app-crypt_gpgme_1″, and finally drop a third file by the same name in /etc/portage/env/ with line “LDFLAGS=-largp”.   grsup is basically a wrapper to emerge which first populates /etc/portage/ according to the world.conf file, then emerges the requested pkg(s) preferring the use of binpkgs over building locally as stated above, and finally does a clean up on /etc/portage/install-worldconf and clean-worldconf isolate the populate and clean up steps so they can be used in scripts run by grsrun when building the release.  To be clear, you don’t have to use grsup to maintain a GRS system.  You can maintain it just like any other Gentoo system, but if you manage your own /etc/portage/, then you are no longer tracking the GRS specs.  grsup is meant to make sure you update, install or remove packages in a manner that keeps the local installation in compliance with the GRS specs for that system.

All this is pretty alpha stuff, so I’d appreciate comments on design and implementation before things begin to solidify.  I am using GRS to build three desktop systems which I’ll blog about next.  I’ve dubbed these systems Lilblue which is a hardened amd64 XFCE4 desktop with uClibc as its standard libc, Bluedragon that uses musl, and finally Bluemoon which uses good old glibc.  (Lilblue is actually a few years old, but the latest release is the first built using GRS.)  All three desktops are identical with respect to the choice of packages and USE flags, and differ only in their libc’s so one can compare the three.  Lilbue and Bluedragon are on the mirrors, or you can get all three from my dev space at http://dev.gentoo.org/~blueness/theblues/.  I didn’t push out bluemoon on the mirrors because a glibc based desktop is nothing special.  But since building with GRS is as simple as cloning a git branch and tweaking, and since the comparison is useful, why not?

The GRS home page is at https://wiki.gentoo.org/wiki/Project:RelEng_GRS.

The C++11 ABI incompatibility problem in Gentoo

Gentoo allows users to have multiple versions of gcc installed and we (mostly?) support systems where userland is partially build with different versions.  There are both advantages and disadvantages to this and in this post, I’m going to talk about one of the disadvantages, the C++11 ABI incompatibility problem.  I don’t exactly have a solution, but at least we can define what the problem is and track it [1].

First what is C++11?  Its a new standard of C++ which is just now making its way through GCC and clang as experimental.  The current default standard is C++98 which you can verify by just reading the defined value of __cplusplus using the preprocessor.

$  g++ -x c++ -E -P - <<< __cplusplus
199711L
$  g++ -x c++ --std=c++98 -E -P - <<< __cplusplus
199711L
$  g++ -x c++ --std=c++11 -E -P - <<< __cplusplus
201103L

This shouldn’t be surprising, even good old C has standards:

$ gcc -x c -std=c90 -E -P - <<< __STDC_VERSION__
__STDC_VERSION__
$ gcc -x c -std=c99 -E -P - <<< __STDC_VERSION__
199901L
$ gcc -x c -std=c11 -E -P - <<< __STDC_VERSION__
201112L

We’ll leave the interpretation of these values as an exercise to the reader.  [2]

The specs for these different standards at least allow for different syntax and semantics in the language.  So here’s an example of how C++98 and C++11 differ in this respect:

// I build with both --std=c++98 and --std=c++11
#include <iostream>
using namespace std;
int main() {
    int i, a[] = { 5, -3, 2, 7, 0 };
    for (i = 0; i < sizeof(a)/sizeof(int); i++)
        cout << a[i] << endl ;
    return 0;
}
// I build with only --std=c++11
#include <iostream>
using namespace std;
int main() {
    int a[] = { 5, -3, 2, 7, 0 };
    for (auto& x : a)
        cout << x << endl ;
    return 0;
}

I think most people would agree that the C++11 way of iterating over arrays (or other objects like vectors) is sexy.  In fact C++11 is filled with sexy syntax, especially when it come to its threading and atomics, and so coders are seduced.  This is an upstream choice and it should be reflected in their build system with –std= sprinkled where needed.  I hope you see why you should never add –std= to your CFLAGS or CXXFLAGS.

The syntactic/semantic differences is the first “incompatiblity” and it is really not our problem downstream.  Our problem in Gentoo comes because of ABI incompatibilities between the two standards arrising from two sources: 1) Linking between objects compiled with –std=c++98 and –std=c++11 is not guaranteed to work.  2) Neither is linking between objects both compiled with –std=c+11 but with different versions of GCC differing in their minior release number.  (The minor release number is x in gcc-4.x.y.)

To see this problem in action, let’s consider the following little snippet of code which uses a C++11 only function [3]

#include <chrono>
using namespace std;
int main() {
    auto x = chrono::steady_clock::now;
}

Now if we compile that with gcc-4.8.3 and check its symbols we get the following:

$ $ g++ --version
g++ (Gentoo Hardened 4.8.3 p1.1, pie-0.5.9) 4.8.3
$ g++ --std=c++11 -c test.cpp
$ readelf -s test.o
Symbol table '.symtab' contains 12 entries:
Num:    Value          Size Type    Bind   Vis      Ndx Name
  0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS test.cpp
  2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
  3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
  4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
  5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
  6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
  7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
  8: 0000000000000000    78 FUNC    GLOBAL DEFAULT    1 main
  9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
 10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_
 11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __stack_chk_fail

We can now confirm that that symbol is in fact in libstdc++.so for 4.8.3 but NOT for 4.7.3 as follows:

$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 | grep _ZNSt6chrono3_V212steady_
  1904: 00000000000e5698     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
  3524: 00000000000c8b00    89 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6 | grep _ZNSt6chrono3_V212steady_
$

Okay, so we’re just seeing an example of things in flux.  Big deal?  If you finish linking test.cpp and check what it links against you get what you expect:

$ g++ --std=c++11 -o test.gcc48 test.o
$ ./test.gcc48
$ ldd test.gcc48
        linux-vdso.so.1 (0x000002ce333d0000)
        libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 (0x000002ce32e88000)
        libm.so.6 => /lib64/libm.so.6 (0x000002ce32b84000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1 (0x000002ce3296d000)
        libc.so.6 => /lib64/libc.so.6 (0x000002ce325b1000)
        /lib64/ld-linux-x86-64.so.2 (0x000002ce331af000)

Here’s where the wierdness comes in.  Suppose we now switch to gcc-4.7.3 and repeat.  Things don’t quite work as expected:

$ g++ --version
g++ (Gentoo Hardened 4.7.3-r1 p1.4, pie-0.5.5) 4.7.3
$ g++ --std=c++11 -o test.gcc47 test.cpp
$ ldd test.gcc47
        linux-vdso.so.1 (0x000003bec8a9c000)
        libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 (0x000003bec8554000)
        libm.so.6 => /lib64/libm.so.6 (0x000003bec8250000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1 (0x000003bec8039000)
        libc.so.6 => /lib64/libc.so.6 (0x000003bec7c7d000)
        /lib64/ld-linux-x86-64.so.2 (0x000003bec887b000)

Note that it says its linking against 4.8.3/libstdc++.so.6 and not 4.7.3.  That’s because of the order in which the library paths are search is defined in /etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf and this file is sorted that way it is on purpose.  So maybe it’ll run!  Let’s try:

$ ./test.gcc47
./test.gcc47: relocation error: ./test.gcc47: symbol _ZNSt6chrono12steady_clock3nowEv, version GLIBCXX_3.4.17 not defined in file libstdc++.so.6 with link time reference

Nope, no joy.  So what’s going on?  Let’s look at the symbols in both test.gcc47 and test.gcc48:

$ readelf -s test.gcc47  | grep chrono
  9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono12steady_cloc@GLIBCXX_3.4.17 (4)
 50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono12steady_cloc
$ readelf -s test.gcc48  | grep chrono
  9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_@GLIBCXX_3.4.19 (4)
 49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_

Whoah!  The symbol wasn’t mangled the same way!  Looking more carefully at *all* the chrono symbols in 4.8.3/libstdc++.so.6 and 4.7.3/libstdc++.so.6 we see the problem.

$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 | grep chrono
  353: 00000000000e5699     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212system_@@GLIBCXX_3.4.19
 1489: 000000000005e0e0    86 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1605: 00000000000e1a3f     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1904: 00000000000e5698     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
 2102: 00000000000c8aa0    86 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212system_@@GLIBCXX_3.4.19
 3524: 00000000000c8b00    89 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6 | grep chrono
 1478: 00000000000c6260    72 FUNC    GLOBAL DEFAULT   12 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1593: 00000000000dd9df     1 OBJECT  GLOBAL DEFAULT   14 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 2402: 00000000000c62b0    75 FUNC    GLOBAL DEFAULT   12 _ZNSt6chrono12steady_cloc@@GLIBCXX_3.4.17

Only 4.7.3/libstdc++.so.6 has _ZNSt6chrono12steady_cloc@@GLIBCXX_3.4.17.  Normally when libraries change their exported symbols, they change their SONAME, but this is not the case here, as running `readelf -d` on both shows.  GCC doesn’t bump the SONAME that way for reasons explained in [4].  Great, so just switch around the order of path search in /etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf.  Then we get the problem the other way around:

$ ./test.gcc47
$ ./test.gcc48
./test.gcc48: /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6: version `GLIBCXX_3.4.19' not found (required by ./test.gcc48)

So no problem if your system has only gcc-4.7.  No problem if it has only 4.8.  But if it has both, then compiling C++11 with 4.7 and linking against libstdc++ for 4.8 (or vice versa) and you get breakage at the binary level.  This is the C++11 ABI incompatibility problem in Gentoo.  As an exercise for the reader, fix!

Ref.

[1] Bug 542482 – (c++11-abi) [TRACKER] c++11 abi incompatibility

[2] This is an old professor’s trick for saying, hey go find out why c90 doesn’t define a value for __STDC_VERSION__ and let me know, ‘cuz I sure as hell don’t!

[3] This example was inspired by bug #513386.  You can verify that it requires –std=c++11 by dropping the flag and getting yelled at by the compiler.

[4] Upstream explains why in comment #5 of GCC bug #61758.  The entire bug is dedicated to this issue.

Lilblue Linux: release 20141212. dlclose() is a problem.

I pushed out another version of Lilblue Linux a few days ago but I don’t feel as good about this release as previous ones.  If you haven’t been following my posts, Lilblue is a fully featured amd64, hardened, XFCE4 desktop that uses uClibc instead of glibc as its standard C library.  The name is a bit misleading because Lilblue is Gentoo but departs from the mainstream in this one respect only.  In fact, I strive to make it as close to mainstream Gentoo as possible so that everything will “just work”.  I’ve been maintaining Lilblue for years as a way of pushing the limits of uClibc, which is mainly intended for embedded systems, to see where it breaks and fix or improve it.

As with all releases, there are always a few minor problems, little annoyances that are not exactly show stopper.  One minor oversight that I found after releasing was that I hadn’t configured smplayer correctly.  That’s the gui front end to mplayer that you’ll find on the toolbar on the bottom of the desktop. It works, just not out-of-the-box.  In the preferences, you need to switch from mplayer2 to mplayer and set the video out to x11.  I’ll add that to the build scripts to make sure its in the next release [1].  I’ve also been migrating away from gnome-centered applications which have been pulling in more and more bloat.  A couple of releases ago I switched from gnome-terminal to xfce4-terminal, and for this release, I finally made the leap from epiphany to midori as the main browser.  I like midori better although it isn’t as popular as epiphany.  I hope others approve of the choice.

But there is one issue I hit which is serious.  It seems with every release I hit at least one of those.  This time it was in uClibc’s implementation of dlclose().  Along with dlopen() and dlsym(), this is how shared objects can be loaded into a running program during execution rather than at load time.  This is probably more familiar to people as “plugins” which are just shared objects loaded while the program is running.  When building the latest Lilblue image, gnome-base/librsvg segfaulted while running gdk-pixbuf-query-loaders [2].  The later links against glib and calls g_module_open() and g_module_close() on many shared objects as it constructs a cache of of loadable objects.  g_module_{open,close} are just glib’s wrappers to dlopen() and dlclose() on systems that provide them, like Linux.  A preliminary backtrace obtained by running gdb on `/usr/bin/gdk-pixbuf-query-loaders ./libpixbufloader-svg.la` pointed to the segfault happening in gcc’s __deregister_frame_info() in unwind-dw2-fde.c, which didn’t sound right.  I rebuilt the entire system with CFLAGS+=”-fno-omit-frame-pointer -O1 -ggdb” and turned on uClibc’s SUPPORT_LD_DEBUG=y, which emits debugging info to stderr when running with LD_DEBUG=y, and DODEBUG=y which prevents symbol stripping in uClibc’s libraries.  A more complete backtrace gave:

Program received signal SIGSEGV, Segmentation fault.
__deregister_frame_info (begin=0x7ffff22d96e0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c:222
222 /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c: No such file or directory.
(gdb) bt
#0 __deregister_frame_info (begin=0x7ffff22d96e0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c:222
#1 0x00007ffff22c281e in __do_global_dtors_aux () from /lib/libbz2.so.1
#2 0x0000555555770da0 in ?? ()
#3 0x0000555555770da0 in ?? ()
#4 0x00007fffffffdde0 in ?? ()
#5 0x00007ffff22d8a2f in _fini () from /lib/libbz2.so.1
#6 0x00007fffffffdde0 in ?? ()
#7 0x00007ffff6f8018d in do_dlclose (vhandle=0x7ffff764a420 <__malloc_lock>, need_fini=32767) at ldso/libdl/libdl.c:860
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The problem occurred when running the global destructors in dlclose()-ing libbz2.so.1.  Line 860 of libdl.c has DL_CALL_FUNC_AT_ADDR (dl_elf_fini, tpnt->loadaddr, (int (*)(void))); which is a macro that calls a function at address dl_elf_fini with signature int(*)(void).  If you’re not familiar with ctor’s and dtor’s, these are the global constructors/destructors whose code lives in the .ctor and .dtor sections of an ELF object which you see when doing readelf -S <obj>.  The ctors are run when a library is first linked or opened via dlopen() and similarly the dtors are run when dlclose()-ing.  Here’s some code to demonstrate this:

# Makefile
all: tmp.so test
tmp.o: tmp.c
        gcc -fPIC -c $^
tmp.so: tmp.o
        gcc -shared -Wl,-soname,$@ -o $@ $
test: test-dlopen.c
        gcc -o $@ $^ -ldl
clean:
        rm -f *.so *.o test
// tmp.c
#include <stdio.h>

void my_init() __attribute__ ((constructor));
void my_fini() __attribute__ ((destructor));

void my_init() { printf("Global initialization!\n"); }
void my_fini() { printf("Global cleanup!\n"); }
void doit() { printf("Doing it!\n" ; }
// test-dlopen.c
// This has very bad error handling, sacrificed for readability.
#include <stdio.h>
#include <dlfcn.h>

int main() {
        int (*mydoit)();
        void *handle = NULL;

        handle = dlopen("./tmp.so", RTLD_LAZY);
        mydoit = dlsym(handle, "doit");
        mydoit();
        dlclose(handle);

        return 0;
}

When run, this code gives:

# ./test 
Global initialization!
Doing it!
Global cleanup!

So, my_init() is run on dlopen() and my_fini() is run on dlclose().  Basically, upon dlopen()-ing a shared object as you would a plugin, the library is first mmap()-ed into the process’s address space using the PT_LOAD addresses which you can see with readelf -l <obj>.  Then, one walks through all the global constructors and runs them.  Upon dlclose()-ing the opposite process is done.  One first walks through the global destructors and runs them, and then one munmap()-s the same mappings.

Figuring I wasn’t the only person to see a problem here, I googled and found that Nathan Copa of Alpine Linux hit a similar problem [3] back when Alpine used to use uClibc — it now uses musl.  He identified a problematic commit and I wrote a patch which would retain the new behavior introduced by that commit upon setting an environment variable NEW_START, but would otherwise revert to the old behavior if NEW_START is unset.  I also added some extra diagnostics to LD_DEBUG to better see what was going on.  I’ll add my patch to a comment below, but the gist of it is that it toggles between the old and new way of calculating the size of the munmap()-ings by subtracting an end and start address.  The old behavior used a mapaddr for the start address that is totally wrong and basically causes every munmap()-ing to fail with EINVAL.  This is corrected by the commit as a simple strace -e trace=munmap shows.

My results when running with LD_DEBUG=1 were interesting to say the least.  With the old behavior, the segfault was gone:

# LD_DEBUG=1 /usr/bin//gdk-pixbuf-query-loaders libpixbufloader-svg.la
...
do_dlclose():859: running dtors for library /lib/libbz2.so.1 at 0x7f26bcf39a26
do_dlclose():864: unmapping: /lib/libbz2.so.1
do_dlclose():869: before new start = 0xffffffffffffffff
do_dlclose():877: during new start = (nil), vaddr = (nil), type = 1
do_dlclose():877: during new start = (nil), vaddr = 0x219c90, type = 1
do_dlclose():881: after new start = (nil)
do_dlclose():987: new start = (nil)
do_dlclose():991: old start = 0x7f26bcf22000
do_dlclose():994: dlclose using old start
do_dlclose():998: end = 0x21b000
do_dlclose():1013: removing loaded_modules: /lib/libbz2.so.1
do_dlclose():1031: removing symbol_tables: /lib/libbz2.so.1
...

Of course, all of the munmap()-ings failed.  The dtors were run, but no shared object got unmapped.  When running the code with the correct value of start, I got:

# NEW_START=1 LD_DEBUG=1 /usr/bin//gdk-pixbuf-query-loaders libpixbufloader-svg.la
...
do_dlclose():859: running dtors for library /lib/libbz2.so.1 at 0x7f5df192ba26
Segmentation fault

What’s interesting here is that the segfault occurs at  DL_CALL_FUNC_AT_ADDR which is before the munmap()-ing and so before any affect that the new value of start should have! This seems utterly mysterious until you realize that there is a whole set of dlopens/dlcloses as gdk-pixbuf-query-loader does its job — I counted 40 in all!  This is as far as I’ve gotten narrowing down this mystery, but I suspect some previous munmap()-ing is breaking the the dtors for libbz2.so.1 and when the call is made to that address, its no longer valid leading to the segfault.

Rich Felker,  aka dalias, the developer of musl, made an interesting comment to me in IRC when I told him about this issue.  He said that the unmappings are dangerous and that musl actually doesn’t do them.  For now, I’ve intentionally left the unmappings in uClibc’s dlclose() “broken” in the latest release of Lilblue, so you can’t hit this bug, but for the next release I’m going to look carefully at what glibc and musl do and try to get this fix upstream.  As I said when I started this post, I’m not totally happy with this release because I didn’t nail the issue, I just implemented a workaround.  Any hits would be much appreciated!

[1] The build scripts can be found in the releng repository at git://git.overlays.gentoo.org/proj/releng.git under tools-uclibc/desktop.  The scripts begin with a <a href=”http://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-uclibc-hardened/”>hardened amd64 uclibc stage3</a> tarball and build up the desktop.

[2] The purpose of librsvg and gdk-pixbuf is not essential for the problem with dlclose(), but for completeness We state them here: librsvg is a library for rendering scalable vector graphics and gdk-pixbuf is an image loading library for gtk+.  gdk-pixbuf-query-loaders reads a libtool .la file and generates cache of loadable shared objects to be consumed by gdk-pixbuf.

[3] See  http://lists.uclibc.org/pipermail/uclibc/2012-October/047059.html. He suggested that the following commit was doing evil things: http://git.uclibc.org/uClibc/commit/ldso?h=0.9.33&id=9b42da7d0558884e2a3cc9a8674ccfc752369610

Tor-ramdisk 20141022 released

Following the latest and greatest exploit in openssl, CVE-2014-3566, aka the POODLE issue, the tor team released version 0.2.4.25.  For those of you not familiar, tor is a system of online anonymity which encrypts and bounces your traffic through relays so as to obfuscated the origin.  Back in 2008, I started a uClibc-based micro Linux distribution, called tor-ramdisk, whose only purpose is to host a tor relay in hardened Gentoo environment purely in RAM.

While the POODLE bug is an openssl issue and is resolved by the latest release 1.0.1j, the tor team decided to turn off the affected protocol, SSL v3 or TLS 1.0 or later.  They also fixed tor to avoid a crash when built using openssl 0.9.8zc, 1.0.0o, or 1.0.1j, with the ‘no-ssl3′ configuration option.  These important fixes to two major components of tor-ramdisk waranted a new release.  Take a look at the upstream ChangeLog for more information.

Since I was upgrading stuff, I also upgrade the kernel to vanilla 3.17.1 + Gentoo’s hardened-patches-3.17.1-1.extras.  All the other components remain the same as the previous release.

i686:
Homepage: http://opensource.dyc.edu/tor-ramdisk
Download:  http://opensource.dyc.edu/tor-ramdisk-downloads

x86_64:
Homepage: http://opensource.dyc.edu/tor-x86_64-ramdisk
Download:  http://opensource.dyc.edu/tor-x86_64-ramdisk-downloads

Lilblue Linux: release 20140925. Adventures beyond the land of POSIX.

It has been four months since my last major build and release of Lilblue Linux, a pet project of mine [1].  The name is a bit pretentious, I admit, since Lilblue is not some other Linux distro.  It is Gentoo, but Gentoo with a twist.  It’s a fully featured amd64, hardened, XFCE4 desktop that uses uClibc instead of glibc as its standard C library.  I use it on some of my workstations at the College and at home, like any other desktop, and I know other people that use it too, but the main reason for its existence is that I wanted to push uClibc to its limits and see where things break.  Back in 2011, I got bored of working with the usual set of embedded packages.  So, while my students where writing their exams in Modern OS, I entertained myself just adding more and more packages to a stage3-amd64-hardened system [2] until I had a decent desktop.  After playing with it on and off, I finally polished it where I thought others might enjoy it too and started pushing out releases.  Recently, I found out that the folks behind uselessd [3] used Lilblue as their testing ground. uselessd is another response to systemd [4], something like eudev [5], which I maintain, so the irony here is too much not to mention!  But that’s another story …

There was only one interesting issue about this release.  Generally I try to keep all releases about the same.  I’m not constantly updating the list of packages in @world.  I did remove pulseaudio this time around because it never did work right and I don’t use it.  I’ll fix it in the future, but not yet!  Instead, I concentrated on a much more interesting problem with a new release of e2fsprogs [6].   The problem started when upstream’s commit 58229aaf removed a broken fallback syscall for fallocate64() on systems where the latter is unavailable [7].  There was nothing wrong with this commit, in fact, it was the correct thing to do.  e4defrag.c used to have the following code:

#ifndef HAVE_FALLOCATE64
#warning Using locally defined fallocate syscall interface.

#ifndef __NR_fallocate
#error Your kernel headers dont define __NR_fallocate
#endif

/*
 * fallocate64() - Manipulate file space.
 *
 * @fd: defrag target file's descriptor.
 * @mode: process flag.
 * @offset: file offset.
 * @len: file size.
 */
static int fallocate64(int fd, int mode, loff_t offset, loff_t len)
{
    return syscall(__NR_fallocate, fd, mode, offset, len);
}
#endif /* ! HAVE_FALLOCATE */

The idea was that, if a configure test for fallocate64() failed because it isn’t available in your libc, but there is a system call for it in the kernel, then e4defrag would just make the syscall via your libc’s indirect syscall() function.  Seems simple enough, except that how system calls are dispatched is architecture and ABI dependant and the above is broken on 32-bit systems [8].  Of course, uClibc didn’t have fallocate() so e4defrag failed to build after that commit.  To my surprise, musl does have fallocate() so this wasn’t a problem there, even though it is a Linux specific function and not in any standard.

My first approach was to patch e2fsprogs to use posix_fallocate() which is supposed to be equivalent to fallocate() when invoked with mode = 0.  e4defrag calls fallocate() in mode = 0, so this seemed like a simple fix.  However, this was not acceptable to Ts’o since he was worried that some libc might implement posix_fallocate() by brute force writing 0’s.  That could be horribly slow for large allocations!  This wasn’t the case for uClibc’s implementation but that didn’t seem to make much difference upstream.  Meh.

Rather than fight e2fsprogs, I sat down and hacked fallocate() into uClibc.  Since both fallocate() and posix_fallocate(), and their LFS counterparts fallocate64() and posix_fallocate64(), make the same syscall, it was sufficient to isolate that in an internal function which both could make use of.  That, plus a test suite, and Bernhard was kind enough to commit it to master [10].  Then a couple of backports, and uClibc’s 0.9.33 branch now has the fix as well.  Because there hasn’t been a release of  uClibc in about two years, I’m using the 0.9.33 branch HEAD for Lilblue, so the problem there was solved — I know its a little problematic, but it was either that or try to juggle dozens of patches.

The only thing that remains is to backport those fixes to vapier’s patchset that he maintains for the uClibc ebuilds.  Since my uClibc stage3’s don’t use the 0.9.33 branch head, but the stable tree ebuilds which use the vanilla 0.9.33.2 release plus Mike’s patchset, upgrading e2fsprogs is blocked for those stages.

This whole process may seem like a real pita, but this is exactly the sort of issues I like uncovering and cleaning up.  So far, the feedback on the latest release is good.  If you want to play with Lilblue and you don’t have a free box, fire up VirtualBox or your emulator of choice and give it a try.  You can download it from the experimental/amd64/uclibc off any mirror [11].

sthttpd: a very tiny and very fast http server with a mature codebase!

Two years ago, I took on the maintenance of thttpd, a web server written by Jef Poskanzer at ACME Labs [1].  The code hadn’t been update in about 10 years and there were dozens of accumulated patches on the Gentoo tree, many of which addressed serious security issues.  I emailed upstream and was told the project was “done” whatever that meant, so I was going to tree clean it.  I expressed my intentions on the upstream mailing list when I got a bunch of “please don’t!” from users.  So rather than maintain a ton of patches, I forked the code, rewrote the build system to use autotools, and applied all the patch.  I dubbed the fork sthttpd.  There was no particular meaning to the “s”.  Maybe “still kicking”?

I put a git repo up on my server [2], got a mail list going [3], and set up bugzilla [4].  There hasn’t been much activity but there was enough because it got noticed by someone who pushed it out in OpenBSD ports [5].

Today, I finally pushed out 2.27.0 after two years.  This release takes care of a couple of new security issues: I fixed the world readable log problem, CVE-2013-0348 [6], and Vitezslav Cizek <vcizek@suse.com>  from OpenSUSE fixed a possible DOS triggered by specially crafted .htpasswd. Bob Tennent added some code to correct headers for .svgz content, and Jean-Philippe Ouellet did some code cleanup.  So it was time.

Web servers are not my style, but its tiny size and speed makes it perfect for embedded systems which are near and dear to my heart.  I also make sure it compiles on *BSD and Linux with glibc, uClibc or musl.  Not bad for a codebase which is over 10 years old!  Kudos to Jef.

Tor-ramdisk 20140925 released

I’ve been blogging about my non-Gentoo work using my drupal site at http://opensource.dyc.edu/  but since I may be loosing that server sometime in the future, I’m going to start duplicating those posts here.  This work should be of interest to readers of Planet Gentoo because it draws a lot from Gentoo, but it doesn’t exactly fall under the category of a “Gentoo Project.”

Anyhow, today I’m releasing tor-ramdisk 20140925.  As you may recall from a previous post, tor-ramdisk is a uClibc-based micro Linux distribution I maintain whose only purpose is to host a Tor server in an environment that maximizes security and privacy.  Security is enhanced using Gentoo’s hardened toolchain and kernel, while privacy is enhanced by forcing logging to be off at all levels.  Also, tor-ramdisk runs in RAM, so no information survives a reboot, except for the configuration file and the private RSA key, which may be exported/imported by FTP or SCP.

A few days ago, the Tor team released 0.2.4.24 with one major bug fix according to their ChangeLog. Clients were apparently sending the wrong address for their chosen rendezvous points for hidden services, which sounds like it shouldn’t work, but it did because they also sent the identity digest. This fix should improve surfing of hidden services. The other minor changes involved updating geoip information and the address of a v3 directory authority, gabelmoo.

I took this opportunity to also update busybox to version 1.22.1, openssl to 1.0.1i, and the kernel to 3.16.3 + Gentoo’s hardened-patches-3.16.3-1.extras. Both the x86 and x86_64 images were tested using node “simba” and showed no issues.

You can get tor-ramdisk from the following urls (at least for now!)

i686:
Homepage: http://opensource.dyc.edu/tor-ramdisk
Download: http://opensource.dyc.edu/tor-ramdisk-downloads

x86_64:
Homepage: http://opensource.dyc.edu/tor-x86_64-ramdisk
Download: http://opensource.dyc.edu/tor-x86_64-ramdisk-downloads

 

Constructing a “Directed Linkage Graph” for an entire system: The usefulness of exporting /var/db/pkg (VDB) information for utilities other than the Package Management System (PMS).

When portage installs a package onto your system, it caches information about that package in a directory at /var/db/pkg/<cat>/<pkg>/, where <cat> is the category (ie ${CATEGORY}) and <pkg> is the package name, version number and revision number (ie. ${P}). This information can then be used at a later time to tell portage information about what’s installed on a system: what packages were installed, what USE flags are set on each package, what CFLAGS were used, etc. Even the ebuild itself is cached so that if it is removed from the tree, and consequently from your system upon `emerge –sync`, you have a local copy in VDB to uninstall or otherwise continue working with the package. If you take look under /var/db/pkg, you’ll find some interesting and some not so interesting files for each <cat>/<pkg>. Among the less interesting are files like DEPEND, RDPENED, FEATURES, IUSE, USE, which just contain the same values as the ebuild variables by the same name. This is redundant because that information is in the ebuild itself which is also cached but it is more readily available since one doesn’t have to re-parse the ebuild to obtain them. More interesting is information gathered about the package as it is installed, like CONTENTS, which contains a list of all the regular files, directories, and sym link which belong to the package, along with their MD5SUM. This list is used to remove files from the system when uninstalling the package. Environment information is also cached, like CBUILD, CHOST, CFLAGS, CXXFLAGS and LDFLAGS which affects the build of compiled packages, and environment.bz2 which contains the entire shell environment that portage ran in, including all shell variables and functions from inherited eclasses. But perhaps the most interesting information, and the most expensive to recalculate is, cached in NEEDED and NEEDED.ELF.2. The later supersedes the former which is only kept for backward compatibility, so let’s just concentrate on NEEDED.ELF.2. Its a list of every ELF object that is installed for a package, along with its ARCH/ABI information, its SONAME if it is a shared object (readelf -d <obj> | grep SONAME, or scanelf -S), any RPATH used to search for its needed shared objects (readelf -d <obj> | grep RPATH, or scanelf -r), and any NEEDED shared objects (the SONAMES of libraries) that it links against (readelf -d <obj> | grep NEEDED or scanelf -n). [1] Unless you’re working with some exotic systems, like an embedded image where everything is statically linked, your user land utilities and applications depend on dynamic linking, meaning that when a process is loaded from the executable on your hard drive, the linker has to make sure that its needed libraries are also loaded and then do some relocation magic to make sure that unresolved symbols in your executable get mapped to appropriate memory locations in the libraries.

The subtleties of linking are beyond the scope of this blog posting [2], but I think its clear from the previous paragraph that one can construct a “directed linkage graph” [3] of dependencies between all the ELF objects on a system. An executable can link to a library which in turn links to another, and so on, usually back to your libc [4]. `readelf -d <obj> | grep NEEDED` only give you the immediate dependencies, but if you follow these through recursively, you’ll get all the needed libraries that an executable needs to run. `ldd <obj>` is a shell script which provides this information, as does ldd.py from the pax-utils package, which also does some pretty indentation to show the depth of the dependency. If this is sounding vaguely familiar, its because portage’s dependency rules “mimic” the underlying linking which is needed at both compile time and at run time. Let’s take an example, curl compiled with polarssl as its SSL backend:

# ldd /usr/bin/curl | grep ssl
        libpolarssl.so.6 => /usr/lib64/libpolarssl.so.6 (0x000003a3d06cd000)
# ldd /usr/lib64/libpolarssl.so.6
        linux-vdso.so.1 (0x0000029c1ae12000)
        libz.so.1 => /lib64/libz.so.1 (0x0000029c1a929000)
        libc.so.6 => /lib64/libc.so.6 (0x0000029c1a56a000)
        /lib64/ld-linux-x86-64.so.2 (0x0000029c1ae13000)

Now let’s see this dependency reflected in the ebuild:

# cat net-misc/curl/curl-7.36.0.ebuild
RDEPEND="
        ...
        ssl? (
                ...
                curl_ssl_polarssl? ( net-libs/polarssl:= app-misc/ca-certificates )
                ...
        )
        ...

Nothing surprising. However, there is one subtlety. What happens if you update polarssl to a version which is not exactly backwards compatible. Then curl which properly linked against the old version of polarssl doesn’t quite work with the new version. This can happen when the library changes its public interface by either adding new functions, removing older ones and/or changing the behavior of existing functions. Usually upstream indicates this change in the library itself by bumping the SONAME:

# readelf -d /usr/lib64/libpolarssl.so.1.3.7 | grep SONAME
0x000000000000000e (SONAME) Library soname: [libpolarssl.so.6]

But how does curl know about the change when emerging an updated version of polarssl? That’s where subslotting comes in. To communicate the reverse dependency, the DEPEND string in curl’s ebuild has := as the slot indicator for polarssl. This means that upgrading polarssl to a new subslot will trigger a recompile of curl:

# emerge =net-libs/polarssl-1.3.8 -vp

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild r U ] net-libs/polarssl-1.3.8:0/7 [1.3.7:0/6] USE="doc sse2 static-libs threads%* zlib -havege -programs {-test}" ABI_X86="(64) (-32) (-x32)" 1,686 kB
[ebuild rR ] net-misc/curl-7.36.0 USE="ipv6 ldap rtmp ssl static-libs threads -adns -idn -kerberos -metalink -ssh {-test}" CURL_SSL="polarssl -axtls -gnutls -nss -openssl" 0 kB

Here the onus is on the downstream maintainer to know when the API breaks backwards compatibility and subslot accordingly. Going through with this build and then checking the new SONAME we find:

# readelf -d /usr/lib/libpolarssl.so.1.3.8 | grep SONAME
0x000000000000000e (SONAME) Library soname: [libpolarssl.so.7]

Aha! Notice the SONAME jumped from .6 for polarssl-1.3.7 to .7 for 1.3.8. Also notice the SONAME version number also follows the subslotting value. I’m sure this was a conscious effort by hasufell and tommyd, the ebuild maintainers, to make life easy.

So I hope my example has shown the importance of tracing forward and reverse linkage between the ELF objects in on a system [5]. Subslotting is relatively new but the need to trace linking has always been there. There was, and still is, revdep-rebuild (from gentoolkit) which uses output from ldd to construct a “directed linkage graph” [6] but is is relatively slow. Unfortunately, it recalculates all the NEEDED.ELF.2 information on the system in order to reconstruct and invert the directed linkage graph. Subslotting has partially obsoleted revdep-rebuild because portage can now track the reverse dependencies, but it has not completely obsoleted it. revdep-rebuild falls back on the SONAMEs in the shared objects themselves — an error here is an upstream error in which the maintainers of the library overlooked updating the value of CURRENT in the build system, usually in a line of some Makefile.am that looks like

LDFLAGS += -version-info $(CURRENT):$(REVISION):$(AGE)

But an error in subslotting is an downstream error where the maintainers didn’t properly subslot their package and any dependencies to reflect upstream’s changing API. So in some ways, these tools complement each other.

Now we come to the real point of the blog: there is no reason for revdep-rebuild to run ldd on every ELF object on the system when it can obtain that information from VDB. This doesn’t save time on inverting the directed graph, but it does save time on running ldd (effectively /lib64/ld-linux-x86-64.so.2 –list) on every ELF object in the system. So guess what the python version does, revdep-rebuild.py? You guessed it, it uses VDB information which is exported by portage via something like

import portage
vardb = portage.db[portage.root]["vartree"].dbapi

So what’s the difference in time? On my system right now, we’re looking at a difference between approximately 5 minutes for revdep-rebuild versus about 20 seconds for revdep-rebuild.py. [7] Since this information is gathered at build time, there is no reason for any Package Management System (PMS) to not export it via some standarized API. portage does so in an awkward fashion but it does export it. paludis does not export NEEDED.ELF.2 although it does export other VDB stuff. I can’t speak to future PMS’s but I don’t see why they should not be held to a standard.

Above I argued that exporting VDB is useful for utilities that maintain consistency between executibles and the shared objects that they consume. I suspect one could counter-argue that it doesn’t need to be exported because “revdep-rebuild” can be made part of portage or whatever your PMS, but I hope my next point will show that exporting NEEDED.ELF.2 information has other uses besides “consistant linking”. So a stronger point is that, not only should PMS export this information, but that it should provide some well documented API for use by other tools. It would be nice for every PMS to have the same API, preferably via python bindings, but as long as it is well documented, it will be useful. (Eg. webapp-config supports both portage and paludis. WebappConfig/wrapper.py has a simple little switch between “import portage; ... portage.settings['CONFIG_PROTECT'] ... ” and “cave print-id-environment-variable -b --format '%%v\n' --variable-name CONFIG_PROTECT %s/%s ...“.)

So besides consistent linking, what else could make use of NEEDED.ELF.2? In the world of Hardened Gentoo, to increase security, a PaX-patched kernel holds processes to much higher standards with respect to their use of memory. [8] Unfortunately, this breaks some packages which want to implement insecure methods, like RWX mmap-ings. Code is compiled “on-the-fly” by JIT compilers which typically create such mappings as an area to which they first write and then execute. However, this is dangerous because it can open up pathways by which arbitrary code can be injected into a running process. So, PaX does not allow RWX mmap-ings — it doesn’t allow it unless that kernel is told otherwise. This is where the PaX flags come in. In the JIT example, marking the executables with `paxctl-ng -m` will turn off PaX’s MPROTECT and allow the RWX mmap-ing. The issue of consistent PaX markings between executable and their libraries arises when it is the library that needs the markings. But when loaded, it is the markings of the executable, not the library, which set the PaX restrictions on the running process. [9]  So if its the library needs the markings, you have to migrate the markings from the library to the executable. Aha! Here we go again: we need to answer the question “what are all the consumers of a particular library so we can migrate its flags to them?” We can, as revdep-rebuild does, re-read all the ELF objects on the system, reconstruct the directed linkage graph, then invert it; or we can just start from the already gathered VDB information and save some time. Like revdep-rebuild and revdep-rebuild.py, I wrote two utilities. The original, revdep-pax, did forward and reverse migration of PaX flags by gathering information with ldd. It was horribly slow, 5 to 10 minutes depending on the number of objects in $PATH and shared object reported by `ldconfig -p`. I then rewrote it to use VDB information and it accomplished the same task in a fraction of the time [10]. Since constructing and inverting the directed linkage graph is such a useful operation, I figured I’d abstract the bare essential code into a python class which you can get at [11]. The data structure containing the entire graph is a compound python dictionary of the form

{
        abi1 : { path_to_elf1 : [ soname1, soname2, ... ], ... },
        abi2 : { path_to_elf2 : [ soname3, soname4, ... ], ... },
        ...
}

whereas the inverted graph has form

{
        abi1 : { soname1 : [ path_to_elf1, path_to_elf2, ... ], ... },
        abi2 : { soname2 : [ path_to_elf3, path_to_elf4, ... ], ... },
        ...
}

Simple!

Okay, up to now I concentrated on exporting NEEDED.ELF.2 information. So what about rest of the VDB information? Is it useful? A lot of questions regarding Gentoo packages can be answered by “grepping the tree.” If you use portage as your PMS, then the same sort of grep-sed-awk foo magic can be performed on /var/db/pkg to answer similar questions. However, this assumes that the PMS’s cached information is in plain ASCII format. If a PMS decides to use something like Berkeley DB or sqlite, then we’re going to need a tool to read the db format which the PMS itself should provide. Because I do a lot of release engineering of uclibc and musl stages, one need that often comes up is the need to compare of what’s installed in the stage3 tarballs for the various arches and alternative libc’s. So, I run some variation of the following script

#!/usr/bin/env python

import portage, re

portdb = portage.db[portage.root]["vartree"].dbapi

arm_stable = open('arm-stable.txt', 'w')
arm_testing = open('arm-testing.txt', 'w')

for pkg in portdb.cpv_all():
keywords = portdb.aux_get(pkg, ["KEYWORDS"])[0]

arches = re.split('\s+', keywords)
        for a in arches:
                if re.match('^arm$', a):
                        arm_stable.write("%s\n" % pkg)
                if re.match('^~arm$', a):
                        arm_testing.write("%s\n" % pkg)

arm_stable.close()
arm_testing.close()

in a stage3-amd64-uclibc-hardened chroot to see what stable packages in the amd64 tarball are ~arm. [12]  I run similar scripts in other chroots to do pairwise comparisons. This gives me some clue as to what may be falling behind in which arches — to keep some consistency between my various stage3 tarballs. Of course there are other utilities to do the same, like eix, gentoolkit etc, but then one still has to resort to parsing the output of those utilities to get the answers you want. An API for VDB information allows you to write your own custom utility to answer the precise questions you need answers. I’m sure you can multiply these examples.

Let me close with a confession. The above is propaganda for the upcoming GLEP 64 which I just wrote [13]. The purpose of the GLEP is to delineate what information should be exported by all PMS’s with particular emphasis on NEEDED.ELF.2 for the reasons stated above.  Currently portage does provide NEEDED.ELF.2 but paludis does not.  I’m not sure what future PMS’s might or might not provide, so let’s set a standard now for an important feature.

 

Notes:

[1] You can see where NEEDED.ELF.2 is generated for details. Take a look at line ~520 of /usr/lib/portage/bin/misc-functions.sh, or search for the comment “Create NEEDED.ELF.2 regardless of RESTRICT=binchecks”.

[2] A simple hands on tutorial can be found at http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html. It also includes dynamic linking via dlopen() which complicates the nice neat graph that can be constructed from NEEDED.ELF.2.

[3] I’m using the term “directed graph” as defined in graph theory. See http://en.wikipedia.org/wiki/Directed_graph. The nodes of the graph are each ELF object and the directed edges are from the consumer of the shared object to the shared object.

[4] Well, not quite. If you run readelf -d on readelf -d /lib/libc.so.6 you’ll see that it links back to /lib/ld-linux-x86-64.so.2 which doesn’t NEED anything else. The former is stricly your standard C library (man 7 libc) while the later is the dynamic linker/loader (man 8 ld.so).

[5] I should mention parenthatically that there are other executable/library file formats such as Mach-O used on MacOS X. The above arguments translate over to any executable formats which permit shared libraries and dynamic linking. My prejudice for ELF is because it is the primary executable format used on Linux and BSD systems.

[6] I’m coining this term here. If you read the revdep-rebuild code, you won’t see reference to any graph there. Bash doesn’t readily lend itself to the neat data structures that python does.

[7] Just a word of caution, revdep-rebuild.py is still in development and does warn when you run it “This is a development version, so it may not work correctly. The original revdep-rebuild script is installed as revdep-rebuild.sh”.

[8] See https://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart for an explanation of what PaX does as well as how it works.

[9] grep the contents of fs/binfmt_elf.c for PT_PAX_FLAGS and CONFIG_PAX_XATTR_PAX_FLAGS to see how these markings are used when the process is loaded from the ELF object. You can see the PaX protection on a running process by using `cat /proc/<pid>/maps | grep ^PaX` or `pspax` form the pax-utils package.

[10] The latest version off the git repo is at http://git.overlays.gentoo.org/gitweb/?p=proj/elfix.git;a=blob;f=scripts/revdep-pax.

[11] http://git.overlays.gentoo.org/gitweb/?p=proj/elfix.git;a=blob;f=pocs/link-graph/link_graph.py.

[12] These stages are distributed at http://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-uclibc-hardened/ and http://distfiles.gentoo.org/experimental/arm/uclibc/.

[13] https://bugs.gentoo.org/show_bug.cgi?id=518630

Continued support for the Lemote Yeeloong: Gentoo Mips is alive and well!

A few years back the Lemote Yeeloong made a splash in the open source community as the world’s first completely “open” system requiring no proprietary software.  Even its BIOS is open source.  It wasn’t long before pictures of Richard Stallman hugging his Yeeloong started popping up throughout the Internet, further boosting its popularity.  I became interested because the Yeeloong involves everything that’s near and dear to my heart: 1) Its loongson2f processor is a mips64el system and I love the slick nature of RISC architectures.  I can actually make sense of its ISA and the assembly.  2) As a 64-bit mips, it supports multiple ABIs, and I love playing with different ABIs.  The images I push come with o32, n32 and n64.  3) While other distros, like Debian, have ported their wares to the Yeeloong, these don’t have the hardening goodness that Gentoo does and so this was an added challenge.  Thanks to Magnus Granberg (zorry) for getting his hardened gcc patches work in mips.  4) Finally, it is “free” as in “libre”.  It is manufactured by Lemote in China, and I like to fantisize that hackers at the NSA curse everytime they encounter one in the wild, although the reality is more likely that I’m owned by the Chinese government :/

So here was the possibility of creating a free and secure system on my favorite architecture!  A couple of summers back, I took on the challenge.  I updated some older stages3 that Matt Turner (mattst88) had prepared and went through the process of seeing what desktop packages would build, which needed patching and which were hopelessly broken on mips, usually because of dependance on x86/amd64 assembly.  The end result was a minimal XFCE4 desktop with full userland hardening.  Unfortunatley, I still don’t have a PaX kernel working, but the issues do not appear to be insurmountable.

Building the initial images was more fun than maintaining them, but I’ve been good about it and I recently prepared release 20140630.  I even started to feel out the community more, so I announced this work as a project on freecode.com, just before the site closed down :(   If you get  a new Lemote Yeeloong, give these images a try.  It’ll save you about 4 days of compiling if you want to bootstrap from a stage3 to a full desktop, not counting all the broken packages you’ll probably hit along the way.  If you’re already running one of my images then you can try to update on your own but expect a lot of conflicts/blockings etc since mips is not a stable arch.  Perhaps the next step to making this more user-friendly is for me to provide the binpkgs on some host.

 

Lilblue Linux: release 20140520

A couple of days ago, I pushed out a new build of Lilblue Linux [1] which is my attempt to turn embedded Linux on its head and use uClibc [2] instead of glibc as the standard C library for a fully featured XFCE4 desktop for amd64. Its userland is built with Gentoo’s hardened toolchain, and the image ships with a kernel built using hardened-sources which include the Grsec/PaX patches for added security, but its main distinguishing feature from mainstream Gentoo is uClibc. Even though Lilblue is something of an experimental project which grew out of my attempt to get more and more packages to build against uClibc, the system works better than I’d originally expected and there are very few glitches which are uClibc specific. You get pretty much everything you’d expect in a desktop, including all your multimedia goodies, office software, games and browsers. mplayer2 works flawlessly!

But all is not well in the land of uClibc these days. It has been over two years since the last release, 0.9.33.2 on May 15, 2012, and there are about 80 commits sitting in the 0.9.33 branch, many of which address critical issues since 0.9.33.2. This causes problems for people building around uClibc, such as buildroot, and there has even been talk on the email lists of dropping uClibc as its main libc in favor of either glibc or musl [3]. Buildroot is maintaining about 50 backported patches, while Mike’s (aka vapier’s) latest patchset has 20. I seem to always have to insert a backported patch of my own here or there, or ask Mike to include it in his patchset.

For this release, I did something that I have mixed feelings about. Instead of 0.9.33.2 + backported patches, I used the latest HEAD of the 0.9.33 git branch. This saved me the trouble of getting more patches backported into a new revision of our 0.9.33.2 ebuild, or by “cheating” and putting the patches into /etc/portage/patches/sys-libs/uclibc, but it did expose a well known problem in uClibc, namely the problem of how its header files stack. A libc’s header files typically include one another to form a stack [4]. For example, on glibc, sched.h stacks as follows

    sched.h
        features.h
            sys/cdefs.h
                features.h
                bits/wordsize.h
            gnu/stubs.h
        bits/types.h
            features.h
            bits/wordsize.h
            bits/typesizes.h
        stddef.h
        time.h
            features.h
            stddef.h
            bits/time.h
                bits/types.h
                bits/timex.h
                    bits/types.h
            bits/types.h
            xlocale.h
        bits/sched.h

Here sched.h includes features.h, bits/types.h, stddef.h, time.h and bits/sched.h. In turn, features.h includes sys/cdefs.h and gnu/stubs.h, and so on. Each indentation indicates another level of inclusion. Circular inclusions are avoided by using #ifdef shields.

At least one reason for this structure is to abstract away differences in architectures and ABIs in an effort to present a hopefully POSIX compliant interface to the rest of userland. So, for example, glibc’s sys/syscall.h looks the same on amd64 as on mipsel, but it includes asm/unistd.h which is different on the two architectures. Each architecture’s asm/unistd.h have their own internal #ifdefs for the different ABIs proper to the architecture, and each #ifdef section in turn defines the values of the various syscalls appropriately for their ABI [5]. Another reason for this stacked inclusion is to make sure that certain definitions, macros or prototypes defined in one header are made available in another header in the same way as they are made available in a c file. This is the reason given, for instance, in the uClibc commit 2e2dc998 which I examine below.

Let’s see where uClibc’s header problems begin. Take a look at Gentoo’s bug #486782, where cdrtools-3.01_alpha17 fails to build against uClibc because its readcd/readcd.c defines “BOOL clone;” which collides with the definition of clone() in bits/sched.h [6]. Nowhere is sched.h included in readcd.c, instead bits/sched.h gets pulled in indirectly because stdio.h is included! Comment 7 reveals the stacking problem. stdio.h’s stacking is complex, but following just the bad chain, we see that stdio.h includes bits/uClibc_stdio.h which includes bits/uClibc_mutex.h which includes pthread.h which includes sched.h which includes bits/sched.h — wheh! If you’re wondering what stdio.h should have to do with sched.h, then you see the problem: too much information is being exposed here. Joerg’s comment on the bug pretty much sums it up: “The related include files (starting from what stdio.h includes) most likely expose the problem because they seem to expose implementation details that do not belong to the scope of visibility of the using code.”

Back to my bump from 0.9.33.2 to the HEAD of the 0.9.33 branch. This bump unexpectedly exposed bugs #510766 and #510770. Here we find that =media-libs/nas-1.9.4 and =app-text/texlive-core-2012-r1, both of which build just fine against 0.9.33.2, fail against HEAD 0.9.33 because of a name collision with abs(). Unlike the case with cdrtools, where the blame is squarely on uClibc, I think this is a case of enough blame to go around. Both of those packages define abs() as a macro even though it is supposed to be a function prototyped in stdlib.h, as per POSIX.1-2001 [7]. At least nas tries to check if abs() has been already defined as a macro, but its still not enough of a check to avoid the name collision. Unfortunately, given its archaic imake system, its not as easy as just adding AC_CHECK_FUNCS([abs]) to configure.ac. texlive-core at least uses GNU autotools, but its collection of utilities define abs() in several different places making a fix messy. On the other hand, why do we suddenly have stdlib.h being pulled in after those macros with HEAD 0.9.33 whereas we didn’t with release 0.9.33.2? It turns out to be uClibc’s tiny commit 2e2dc998 which I quote here:

	sched.h: include stdlib.h for malloc/free
	Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>

	diff --git a/libc/sysdeps/linux/common/bits/sched.h b/libc/sysdeps/linux/common/bits/sched.h
	index 7d6273f..878550d 100644
	--- a/libc/sysdeps/linux/common/bits/sched.h
	+++ b/libc/sysdeps/linux/common/bits/sched.h
	@@ -109,6 +109,7 @@ struct __sched_param
	 /* Size definition for CPU sets.  */
	 # define __CPU_SETSIZE	1024
	 # define __NCPUBITS	(8 * sizeof (__cpu_mask))
	+# include <stdlib.h>
	 
	 /* Type for array elements in 'cpu_set_t'.  */
	 typedef unsigned long int __cpu_mask;

Both packages pull in stdio.h after their macro definition of abs(). But now stdio.h, which pulls in bits/sched.h, further pulls in stdlib.h with the function prototype of abs() and … BOOM! … we get

/usr/include/stdlib.h:713:12: error: expected identifier or '(' before 'int'
/usr/include/stdlib.h:713:12: error: expected ')' before '>' token

Untangling the implementation details is a going to be a thorny problem. And, given uClibc’s faltering release schedule schedule, things are probably not going to get better soon. I have looked at the issue a bit, but not enough to start unraveling it. Its easier just to apply hacky patches to the odd package here and there than to rethink uClibc’s internal implementations. If we are going to start rethinking implementation, the musl [8] is much more exciting. uClibc is used in lots of embedded systems and the header issue is not going to be a show stopper for it or for Liblue, but it does make alternatives look like musl more attractive.

References:

[1] https://wiki.gentoo.org/wiki/Project:Hardened_uClibc/Lilblue

[2] http://www.uclibc.org

[3] See Petazzoni’s email to the uClibc community.

[4] I wrote a little python script to generate these stacks since creating them manually . You can download it from my dev space: header-stack.py. Note that the stacking is influenced by #ifdef’s throughout, eg #ifdef __USE_GNU, which the script ignores, but it does give a good starting place for how the stacking goes.

[5] As of glibc 2.17, on mips, asm/unistd.h defines the various __NR_* values in a flat file with three #ifdefs sections for _MIPS_SIM_ABI32, _MIPS_SIM_ABI64 and _MIPS_SIM_NABI32, respectively ABI=o32, n64 and n32. Using my script from [4], the stacking looks as follows:

    sys/syscall.h
        asm/unistd.h
            asm/sgidefs.h
        bits/syscall.h
            sgidefs.h

In contrast, on amd64, each ABI is broken out further into their own file, with asm/unistd_32.h, asm/unistd_x32.h or asm/unistd_64.h included into asm/unistd.h for __i386__, __ILP32__, or __ILP64__ respectively. Here the stacking is

    sys/syscall.h
        asm/unistd.h
            asm/unistd_32.h
            asm/unistd_x32.h
            asm/unistd_64.h
        bits/syscall.h

Remember, on both architectures, sys/syscall.h are identical, and that is the file you should include in your c programs, not any of the asm/* which often carry warnings not to include them directly.

[6] man 2 clone

[7] man 3 abs

[8] http://www.musl-libc.org/