Lilblue Linux: release 20140520

A couple of days ago, I pushed out a new build of Lilblue Linux [1] which is my attempt to turn embedded Linux on its head and use uClibc [2] instead of glibc as the standard C library for a fully featured XFCE4 desktop for amd64. Its userland is built with Gentoo’s hardened toolchain, and the image ships with a kernel built using hardened-sources which include the Grsec/PaX patches for added security, but its main distinguishing feature from mainstream Gentoo is uClibc. Even though Lilblue is something of an experimental project which grew out of my attempt to get more and more packages to build against uClibc, the system works better than I’d originally expected and there are very few glitches which are uClibc specific. You get pretty much everything you’d expect in a desktop, including all your multimedia goodies, office software, games and browsers. mplayer2 works flawlessly!

But all is not well in the land of uClibc these days. It has been over two years since the last release, 0.9.33.2 on May 15, 2012, and there are about 80 commits sitting in the 0.9.33 branch, many of which address critical issues since 0.9.33.2. This causes problems for people building around uClibc, such as buildroot, and there has even been talk on the email lists of dropping uClibc as its main libc in favor of either glibc or musl [3]. Buildroot is maintaining about 50 backported patches, while Mike’s (aka vapier’s) latest patchset has 20. I seem to always have to insert a backported patch of my own here or there, or ask Mike to include it in his patchset.

For this release, I did something that I have mixed feelings about. Instead of 0.9.33.2 + backported patches, I used the latest HEAD of the 0.9.33 git branch. This saved me the trouble of getting more patches backported into a new revision of our 0.9.33.2 ebuild, or by “cheating” and putting the patches into /etc/portage/patches/sys-libs/uclibc, but it did expose a well known problem in uClibc, namely the problem of how its header files stack. A libc’s header files typically include one another to form a stack [4]. For example, on glibc, sched.h stacks as follows

    sched.h
        features.h
            sys/cdefs.h
                features.h
                bits/wordsize.h
            gnu/stubs.h
        bits/types.h
            features.h
            bits/wordsize.h
            bits/typesizes.h
        stddef.h
        time.h
            features.h
            stddef.h
            bits/time.h
                bits/types.h
                bits/timex.h
                    bits/types.h
            bits/types.h
            xlocale.h
        bits/sched.h

Here sched.h includes features.h, bits/types.h, stddef.h, time.h and bits/sched.h. In turn, features.h includes sys/cdefs.h and gnu/stubs.h, and so on. Each indentation indicates another level of inclusion. Circular inclusions are avoided by using #ifdef shields.

At least one reason for this structure is to abstract away differences in architectures and ABIs in an effort to present a hopefully POSIX compliant interface to the rest of userland. So, for example, glibc’s sys/syscall.h looks the same on amd64 as on mipsel, but it includes asm/unistd.h which is different on the two architectures. Each architecture’s asm/unistd.h have their own internal #ifdefs for the different ABIs proper to the architecture, and each #ifdef section in turn defines the values of the various syscalls appropriately for their ABI [5]. Another reason for this stacked inclusion is to make sure that certain definitions, macros or prototypes defined in one header are made available in another header in the same way as they are made available in a c file. This is the reason given, for instance, in the uClibc commit 2e2dc998 which I examine below.

Let’s see where uClibc’s header problems begin. Take a look at Gentoo’s bug #486782, where cdrtools-3.01_alpha17 fails to build against uClibc because its readcd/readcd.c defines “BOOL clone;” which collides with the definition of clone() in bits/sched.h [6]. Nowhere is sched.h included in readcd.c, instead bits/sched.h gets pulled in indirectly because stdio.h is included! Comment 7 reveals the stacking problem. stdio.h’s stacking is complex, but following just the bad chain, we see that stdio.h includes bits/uClibc_stdio.h which includes bits/uClibc_mutex.h which includes pthread.h which includes sched.h which includes bits/sched.h — wheh! If you’re wondering what stdio.h should have to do with sched.h, then you see the problem: too much information is being exposed here. Joerg’s comment on the bug pretty much sums it up: “The related include files (starting from what stdio.h includes) most likely expose the problem because they seem to expose implementation details that do not belong to the scope of visibility of the using code.”

Back to my bump from 0.9.33.2 to the HEAD of the 0.9.33 branch. This bump unexpectedly exposed bugs #510766 and #510770. Here we find that =media-libs/nas-1.9.4 and =app-text/texlive-core-2012-r1, both of which build just fine against 0.9.33.2, fail against HEAD 0.9.33 because of a name collision with abs(). Unlike the case with cdrtools, where the blame is squarely on uClibc, I think this is a case of enough blame to go around. Both of those packages define abs() as a macro even though it is supposed to be a function prototyped in stdlib.h, as per POSIX.1-2001 [7]. At least nas tries to check if abs() has been already defined as a macro, but its still not enough of a check to avoid the name collision. Unfortunately, given its archaic imake system, its not as easy as just adding AC_CHECK_FUNCS([abs]) to configure.ac. texlive-core at least uses GNU autotools, but its collection of utilities define abs() in several different places making a fix messy. On the other hand, why do we suddenly have stdlib.h being pulled in after those macros with HEAD 0.9.33 whereas we didn’t with release 0.9.33.2? It turns out to be uClibc’s tiny commit 2e2dc998 which I quote here:

	sched.h: include stdlib.h for malloc/free
	Signed-off-by: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>

	diff --git a/libc/sysdeps/linux/common/bits/sched.h b/libc/sysdeps/linux/common/bits/sched.h
	index 7d6273f..878550d 100644
	--- a/libc/sysdeps/linux/common/bits/sched.h
	+++ b/libc/sysdeps/linux/common/bits/sched.h
	@@ -109,6 +109,7 @@ struct __sched_param
	 /* Size definition for CPU sets.  */
	 # define __CPU_SETSIZE	1024
	 # define __NCPUBITS	(8 * sizeof (__cpu_mask))
	+# include <stdlib.h>
	 
	 /* Type for array elements in 'cpu_set_t'.  */
	 typedef unsigned long int __cpu_mask;

Both packages pull in stdio.h after their macro definition of abs(). But now stdio.h, which pulls in bits/sched.h, further pulls in stdlib.h with the function prototype of abs() and … BOOM! … we get

/usr/include/stdlib.h:713:12: error: expected identifier or '(' before 'int'
/usr/include/stdlib.h:713:12: error: expected ')' before '>' token

Untangling the implementation details is a going to be a thorny problem. And, given uClibc’s faltering release schedule schedule, things are probably not going to get better soon. I have looked at the issue a bit, but not enough to start unraveling it. Its easier just to apply hacky patches to the odd package here and there than to rethink uClibc’s internal implementations. If we are going to start rethinking implementation, the musl [8] is much more exciting. uClibc is used in lots of embedded systems and the header issue is not going to be a show stopper for it or for Liblue, but it does make alternatives look like musl more attractive.

References:

[1] https://wiki.gentoo.org/wiki/Project:Hardened_uClibc/Lilblue

[2] http://www.uclibc.org

[3] See Petazzoni’s email to the uClibc community.

[4] I wrote a little python script to generate these stacks since creating them manually . You can download it from my dev space: header-stack.py. Note that the stacking is influenced by #ifdef’s throughout, eg #ifdef __USE_GNU, which the script ignores, but it does give a good starting place for how the stacking goes.

[5] As of glibc 2.17, on mips, asm/unistd.h defines the various __NR_* values in a flat file with three #ifdefs sections for _MIPS_SIM_ABI32, _MIPS_SIM_ABI64 and _MIPS_SIM_NABI32, respectively ABI=o32, n64 and n32. Using my script from [4], the stacking looks as follows:

    sys/syscall.h
        asm/unistd.h
            asm/sgidefs.h
        bits/syscall.h
            sgidefs.h

In contrast, on amd64, each ABI is broken out further into their own file, with asm/unistd_32.h, asm/unistd_x32.h or asm/unistd_64.h included into asm/unistd.h for __i386__, __ILP32__, or __ILP64__ respectively. Here the stacking is

    sys/syscall.h
        asm/unistd.h
            asm/unistd_32.h
            asm/unistd_x32.h
            asm/unistd_64.h
        bits/syscall.h

Remember, on both architectures, sys/syscall.h are identical, and that is the file you should include in your c programs, not any of the asm/* which often carry warnings not to include them directly.

[6] man 2 clone

[7] man 3 abs

[8] http://www.musl-libc.org/

Tor-ramdisk: a tiny embedded image to host a tor relay

I hate being watched as much as the next person. Even the NSA loves its privacy otherwise it would be a transparent organization. What’s frightening and exciting about the technology we’re building today is that we are poised on a pivot point between extremes: deep invasion of our privacy and wide scale efforts to protect it. For those of you who don’t know the Tor Project [1] you really should look into it. Encrypted communication hides what you are saying from third party eavesdropping, but it does not hide who’s doing the talking, ie. it cannot hide the identity of one of the parties and so does not preserve your anonymity. If you decide to aim your browser at https://www.google.com/ then you can remain fairly certain that no one else is watching what you are googling for: you know, and google knows. But unfortunately, so does anyone google decides to tell! Given some of the exceptionally coercive methods governments use to make their demands [3], you might as well just announce your browsing habits publicly and be done with it.

Here’s where tor steps in. It doesn’t just encrypt your traffic, but also bounces it around the world via tor relays in such a way that even the nodes themselves can’t expose the origin of the traffic. Thus, tor provides its users with pretty good anonymity [4]. Now when google looks at its logs, it won’t see your ip address, but the ip address of one of the tor exit nodes. These are themselves publicly known [5], but the original ip from where the traffic is coming remains hidden. I’ve been using tor since about 2005. In July 2007, a tor operator in Germany [6] was arrested. Luckily his computers were not confiscated, but they could have been. The police wouldn’t have gotten much off of them, but there would have been the private keys and some other “evidence.” Running tor or any system of anonymity is not illegal, and it should never be illegal as it is in some countries, but today the line between what is legal and what powers governments will abuse has been blurred if not erased entirely. 2007 was also about the time the cloud computing was catching on, so I got the idea of creating a micro Linux distribution whose only purpose was to house a tor relay in an environment that maximizes security and privacy. The image boots from an ISO into ram, any keys or configs are scp-ed in, and upon power down … poof! … nothing to see here, move along. This was also about the time that I was getting involved with hardened Gentoo development and I met up with Magnus Granberg (zorry) who was working on migrating toolchain hardening from gcc-3 to gcc-4. I was teaching a course on embedded Linux, primarily building systems with uClibc and buildroot, and so tor-ramdisk was born [7]. I originally targeted only i686, but later added amd64 and mips32r2 for router boards like the Mikrotik RB450G.

So what goes into tor-ramdisk? You can read the build scripts [8] for details, but basically the kernel is Gentoo’s hardened-sources kernel with PaX and Grsec turned on full force. A minimal userland is provided by a crippled busybox with most of its applets turned off. You need openssl for tor itself as well as openssh which provides for scp-ing keys and config files in and out of the image. Tor critically depends on the time being right, so I used openntpd for synchronization. You also need a good source of entropy for key generation and encryption, which is always a problem on embedded systems [9], so haveged is used shore up the kernel’s /dev/random. Finally we need uClibc and libevent. I cheat a little and build on uClibc virtual machines, so I can just copy over the needed libraries rather than cross compiling them. Everything is built using Gentoo’s hardened toolchains and so all the ELFs binaries have SSP, PIE + ASLR, relro, bind_now and other security goodies [10]. For i686 and amd64, kernel and userland are bundled up in a bootable ISO image, while for mips I embed the initramfs in the bootlable Linux image which can be delivered via tftp. When the system boots, the user is presented with a menu driven system on tty1 to configure and start tor. The menu is a shell script spawned by init as “tty1::respawn:/bin/setup”. On tty2, tty3 amd tty3 we have, respectively, the output of nmeter (ascii based system usage meter provided by busybox), ntpd and haveged.

I don’t know why I haven’t blogged about tor-ramdisk before on Planet Gentoo, but it is a Gentoo “derivative.” It is also popular project, at least according to freecode.com. The i686 image is the most popular, followed by the amd64, with several hundred downloads per release. I’ve stopped producing the mips32r2 image because no one was using it, even though it was the most fun to build! There have been suggestions for new features but I’ve tended to resist because I like the ~6 MB image. If you can think of something I can add without growing that image much, send patches my way!

 

 

References:

[1] https://www.torproject.org/. The Gentoo package is net-misc/tor.

[2] “fairly certain” but not 100% certain as we recently learned from CVE-2014-0160, aka the “heartbleed” bug. See https://en.wikipedia.org/wiki/Heartbleed

[3] You can read the story of lavabit’s owner as told by him at http://www.theguardian.com/commentisfree/2014/may/20/why-did-lavabit-shut-down-snowden-email

[4] There are attacks against tor so it isn’t perfect, but it is by far the best anonymity software out there. See the wiki page on tor for its weaknesses: http://en.wikipedia.org/wiki/Tor_(anonymity_network)

[5] There are various lists of exit and relay nodes. For a live list, check out http://torstatus.blutmagie.de/

[6] http://www.cnet.com/news/tor-anonymity-server-admin-arrested/

[7] The main development site is http://opensource.dyc.edu/tor-ramdisk. I announce releases at https://freecode.com/projects/tor-ramdisk.

[8] https://gitweb.torproject.org/tor-ramdisk.git

[9] See Josh Ayers’ email to the tor-ramdisk list http://opensource.dyc.edu/pipermail/tor-ramdisk/2014-February/000119.html.

[10] You can read a little bit about these hardening techniques from the “Project Description” of a related project, Lilblue Linux: https://wiki.gentoo.org/wiki/Project:Hardened_uClibc/Lilblue