Why macros like __GLIBC__ and __UCLIBC__ are bad.

I’ll be honest, this is a short post because the aggregation on planet.gentoo.org is failing for my account!  So, Jorge (jmbsvicetto) is debugging it and I need to push out another blog entry to trigger venus, the aggregation program.  Since I don’t like writing trivial stuff, I’m going to write something short, but hopefully important.

C Standard libraries, like glibc, uClibc, musl and the like, were born out of a world in which every UNIX vendor had their own set of useful C functions.  Code portability put pressure on various libc to incorporate these functions from other libc, first leading to to a mess and then to standards like POSIX, XOPEN, SUSv4 and so on.  Chpt 1 of Kerrisk’s The Linux Programming Interface has a nice write up on this history.

We still live in the shadows of that world today.  If you look thorugh the code base of uClibc you’ll see lots of macros like __GLIBC__, __UCLIBC__, __USE_BSD, and __USE_GNU.  These are used in #ifdef … #endif which are meant to shield features unless you want a glibc or uClibc only feature.

musl has stubbornly and correctly refused to include a __MUSL__ macro.  Consider the approach to portability taken by GNU autotools.  Marcos such as AC_CHECK_LIBS(), AC_CHECK_FUNC() or AC_CHECK_HEADERS() unambiguously target the feature in question without making the use of __GLIBC__ or __UCLIBC__.  Whereas the previous approach globs together functions into sets, the latter just simply asks, do you have this function or not?

Now consider how uClibc makes use of both __GLIBC__ and __UCLIBC__.  If a function is provided by the former but not by the latter, then it expects a program to use

#if defined(__GLIBC__) && !defined(__UCLIBC__)

This is getting a bit ugly and syntactically ambiguous.  Someone not familiar with this could easily misinterpret it, or reject it.

So I’ve hit bugs like these.  I hit one in gdk-pixbuf and I was not able to convince upstream to consistently use __GLIBC__ and __UCLIBC__.   Alternatively I hit this in geocode-glib and geoclue, and they did accept it.  I went with the wrong minded approach because that’s what was already there, and I didn’t feel like sifting through their code base and revamping their build system.  This isn’t just laziness, its historical weight.

So kudos to musl.  And for all the faults of GNU autotools, at least its approach to portability is correct.

 

 

 

 

hardened-sources Role-based Access Control (RBAC): how to write mostly permissive policies.

RBAC is a security feature of the hardened-sources kernels.  As its name suggests, its a role-based access control system which allows you to define policies for restricting access to files, sockets and other system resources.   Even root is restricted, so attacks that escalate privilege are not going to get far even if they do obtain root.  In fact, you should be able to give out remote root access to anyone on a well configured system running RBAC and still remain confident that you are not going to be owned!  I wouldn’t recommend it just in case, but it should be possible.

It is important to understand what RBAC will give you and what it will not.  RBAC has to be part of a more comprehensive security plan and is not a single security solution.  In particular, if one can compromise the kernel, then one can proceed to compromise the RBAC system itself and undermine whatever security it offers.  Or put another way, protecting root is pretty much a moot point if an attacker is able to get ring 0 privileges.  So, you need to start with an already hardened kernel, that is a kernel which is able to protect itself.  In practice, this means configuring most of the GRKERNSEC_* and PAX_* features of a hardened-sources kernel.  Of course, if you’re planning on running RBAC, you need to have that option on too.

Once you have a system up and running with a properly configured kernel, the next step is to set up the policy file which lives at /etc/grsec/policy.  This is where the fun begins because you need to ask yourself what kind of a system you’re going to be running and decide on the policies you’re going to implement.  Most of the existing literature is about setting up a minimum privilege system for a server which runs only a few simple processes, something like a LAMP stack.  I did this for years when I ran a moodle server for D’Youville College.  For a minimum privilege system, you want to deny-by-default and only allow certain processes to have access to certain resources as explicitly stated in the policy file.  RBAC is ideally suited for this.  Recently, however, I was asked to set up a system where the opposite was the case, so this article is going to explore the situation where you want to allow-by-default; however, for completeness let me briefly cover deny-by-default first.

The easiest way to proceed is to get all your services running as they should and then turn on learning mode for about a week, or at least until you have one cycle of, say, log rotations and other cron based jobs.  Basically your services should have attempted to access each resource at least once so the event gets logged.  You then distill those logs into a policy file describing only what should be permitted and tweak as needed.  Basically, you proceed something as follows:

1. gradm -P  # Create a password to enable/disable the entire RBAC system
2. gradm -P admin  # Create a password to authenticate to the admin role
3. gradm –F –L /etc/grsec/learning.log # Turn on system wide learning
4. # Wait a week.  Don't do anything you don't want to learn.
5. gradm –F –L /etc/grsec/learning.log –O /etc/grsec/policy  # Generate the policy
6. gradm -E # Enable RBAC system wide
7. # Look for denials.
8. gradm -a admin  # Authenticate to admin to do extraordinary things, like tweak the policy file
9. gradm -R # reload the policy file
10. gradm -u # Drop those privileges to do ordinary things
11. gradm -D # Disable RBAC system wide if you have to

Easy right?  This will get you pretty far but you’ll probably discover that some things you want to work are still being denied because those particular events never occurred during the learning.  A typical example here, is you might have ssh’ed in from one IP, but now you’re ssh-ing in from a different IP and you’re getting denied.  To tweak your policy, you first have to escape the restrictions placed on root by transitioning to the admin role.  Then using dmesg you can see what was denied, for example:

[14898.986295] grsec: From 192.168.5.2: (root:U:/) denied access to hidden file / by /bin/ls[ls:4751] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:4327] uid/euid:0/0 gid/egid:0/0

This tells you that root, logged in via ssh from 192.168.5.2, tried to ls / but was denied.  As we’ll see below, this is a one line fix, but if there are a cluster of denials to /bin/ls, you may want to turn on learning on just that one subject for root.  To do this you edit the policy file and look for subject /bin/ls under role root.  You then add an ‘l’ to the subject line to enable learning for just that subject.

role root uG
…
# Role: root
subject /bin/ls ol {  # Note the ‘l’

You restart RBAC using  gradm -E -L /etc/grsec/partial-learning.log and obtain the new policy for just that subject by running gradm -L /etc/grsec/partial-learning.log  -O /etc/grsec/partial-learning.policy.  That single subject block can then be spliced into the full policy file to change the restircions on /bin/ls when run by root.

Its pretty obvious that RBAC designed to do deny-by-default. If access is not explicitly granted to a subject (an executable) to access some object (some system resource) when its running in some role (as some user), then access is denied.  But what if you want to create a policy which is mostly allow-by-default and then you just add a few denials here and there?  While RBAC is more suited for the opposite case, we can do something like this on a per account basis.

Let’s start with a failry permissive policy file for root:

role admin sA
subject / rvka {
	/			rwcdmlxi
}

role default
subject / {
	/			h
	-CAP_ALL
	connect disabled
	bind    disabled
}

role root uG
role_transitions admin
role_allow_ip 0.0.0.0/0
subject /  {
	/			r
	/boot			h
#
	/bin			rx
	/sbin			rx
	/usr/bin		rx
	/usr/libexec		rx
	/usr/sbin		rx
	/usr/local/bin		rx
	/usr/local/sbin		rx
	/lib32			rx
	/lib64			rx
	/lib64/modules		h
	/usr/lib32		rx
	/usr/lib64		rx
	/usr/local/lib32	rx
	/usr/local/lib64	rx
#
	/dev			hx
	/dev/log		r
	/dev/urandom		r
	/dev/null		rw
	/dev/tty		rw
	/dev/ptmx		rw
	/dev/pts		rw
	/dev/initctl		rw
#
	/etc/grsec		h
#
	/home			rwcdl
	/root			rcdl
#
	/proc/slabinfo		h
	/proc/modules		h
	/proc/kallsyms		h
#
	/run/lock		rwcdl
	/sys			h
	/tmp			rwcdl
	/var			rwcdl
#
	+CAP_ALL
	-CAP_MKNOD
	-CAP_NET_ADMIN
	-CAP_NET_BIND_SERVICE
	-CAP_SETFCAP
	-CAP_SYS_ADMIN
	-CAP_SYS_BOOT
	-CAP_SYS_MODULE
	-CAP_SYS_RAWIO
	-CAP_SYS_TTY_CONFIG
	-CAP_SYSLOG
#
	bind 0.0.0.0/0:0-32767 stream dgram tcp udp igmp
	connect 0.0.0.0/0:0-65535 stream dgram tcp udp icmp igmp raw_sock raw_proto
	sock_allow_family all
}

The syntax is pretty intuitive. The only thing not illustrated here is that a role can, and usually does, have multiple subject blocks which follow it. Those subject blocks belong only to the role that they are under, and not another.

The notion of a role is critical to understanding RBAC. Roles are like UNIX users and groups but within the RBAC system. The first role above is the admin role. It is ‘special’ meaning that it doesn’t correspond to any UNIX user or group, but is only defined within the RBAC system. A user will operate under some role but may transition to another role if the policy allows it. Transitioning to the admin role is reserved only for root above; but in general, any user can transition to any special role provided it is explicitly specified in the policy. No matter what role the user is in, he only has the UNIX privileges for his account. Those are not elevated by transitioning, but the restrictions applied to his account might change. Thus transitioning to a special role can allow a user to relax some restrictions for some special reason. This transitioning is done via gradm -a somerole and can be password protected using gradm -P somerole.

The second role above is the default role. When a user logs in, RBAC determines the role he will be in by first trying to match the user name to a role name. Failing that, it will try to match the group name to a role name and failing that it will assign the user the default role.

The third role above is the root role and it will be the main focus of our attention below.

The flags following the role name specify the role’s behavior. The ‘s’ and ‘A’ in the admin role line say, respectively, that it is a special role (ie, one not to be matched by a user or group name) and that it is has extra powers that a normal role doesn’t have (eg, it is not subject ptrace restrictions). Its good to have the ‘A’ flag in there, but its not essential for most uses of this role. Its really its subject block which makes it useful for administration. Of course, you can change the name if you want to practice a little bit of security by obfuscation. As long as you leave the rest alone, it’ll still function the same way.

The root role has the ‘u’ and the ‘G’ flags. The ‘u’ flag says that this role is to match a user by the same name, obviously root in this case. Alternatively, you can have the ‘g’ flag instead which says to match a group by the same name. The ‘G’ flag gives this role permission to authenticate to the kernel, ie, to use gradm. Policy information is automatically added that allows gradm to access /dev/grsec so you don’t need to add those permissions yourself. Finally the default role doesn’t and shouldn’t have any flags. If its not a ‘u’ or ‘g’ or ‘s’ role, then its a default role.

Before we jump into the subject blocks, you’ll notice a couple of lines after the root role. The first says ‘role_transitions admin’ and permits the root role to transition to the admin role. Any special roles you want this role to transition to can be listed on this line, space delimited. The second line says ‘role_allow_ip 0.0.0.0/0’. So when root logs in remotely, it will be assigned the root role provided the login is from an IP address matching 0.0.0.0/0. In this example, this means any IP is allowed. But if you had something like 192.168.3.0/24 then only root logins from the 192.168.3.0 network would get user root assigned role root. Otherwise RBAC would fall back on the default role. If you don’t have the line in there, get used to logging on on console because you’ll cut yourself off!

Now we can look at the subject blocks. These define the access controls restricting processes running in the role to which those subjects belong. The name following the ‘subject’ keyword is either a path to a directory containing executables or to an executable itself. When a process is started from an executable in that directory, or from the named executable itself, then the access controls defined in that subject block are enforced. Since all roles must have the ‘/’ subject, all processes started in a given role will at least match this subject. You can think of this as the default if no other subject matches. However, additional subject blocks can be defined which further modify restrictions for particular processes. We’ll see this towards the end of the article.

Let’s start by looking at the ‘/’ subject for the default role since this is the most restrictive set of access controls possible. The block following the subject line lists the objects that the subject can act on and what kind of access is allowed. Here we have ‘/ h’ which says that every file in the file system starting from ‘/’ downwards is hidden from the subject. This includes read/write/execute/create/delete/hard link access to regular files, directories, devices, sockets, pipes, etc. Since pretty much everything is forbidden, no process running in the default role can look at or touch the file system in any way. Don’t forget that, since the only role that has a corresponding UNIX user or group is the root role, this means that every other account is simply locked out. However the file system isn’t the only thing that needs protecting since it is possible to run, say, a malicious proxy which simply bounces evil network traffic without ever touching the filesystem. To control network access, there are the ‘connect’ and ‘bind’ lines that define what remote addresses/ports the subject can connect to as a client, or what local addresses/ports it can listen on as a server. Here ‘disabled’ means no connections or bindings are allowed. Finally, we can control what Linux capabilities the subject can assume, and -CAP_ALL means they are all forbidden.

Next, let’s look at the ‘/’ subject for the admin role. This, in contrast to the default role, is about as permissive as you can get. First thing we notice is the subject line has some additional flags ‘rvka’. Here ‘r’ means that we relax ptrace restrictions for this subject, ‘a’ means we do not hide access to /dev/grsec, ‘k’ means we allow this subject to kill protected processes and ‘v’ means we allow this subject to view hidden processes. So ‘k’ and ‘v’ are interesting and have counterparts ‘p’ and ‘h’ respectively. If a subject is flagged as ‘p’ it means its processes are protected by RBAC and can only be killed by processes belonging to a subject flagged with ‘k’. Similarly processes belonging to a subject marked ‘h’ can only be viewed by processes belonging to a subject marked ‘v’. Nifty, eh? The only object line in this subject block is ‘/ rwcdmlxi’. This says that this subject can ‘r’ead, ‘w’rite, ‘c’reate, ‘d’elete, ‘m’ark as setuid/setgid, hard ‘l’ink to, e’x’ecute, and ‘i’nherit the ACLs of the subject which contains the object. In other words, this subject can do pretty much anything to the file system.

Finally, let’s look at the ‘/’ subject for the root role. It is fairly permissive, but not quite as permissive as the previous subject. It is also more complicated and many of the object lines are there because gradm does a sanity check on policy files to help make sure you don’t open any security holes. Notice that here we have ‘+CAP_ALL’ followed by a series of ‘-CAP_*’. Each of these were included otherwise gradm would complain. For example, if ‘CAP_SYS_ADMIN’ is not removed, an attacker can mount filesystems to bypass your policies.

So I won’t go through this entire subject block in detail, but let me highlight a few points. First consider these lines

	/			r
	/boot			h
	/etc/grsec		h
	/proc/slabinfo		h
	/proc/modules		h
	/proc/kallsyms		h
	/sys			h

The first line gives ‘r’ead access to the entire file system but this is too permissive and opens up security holes, so we negate that for particular files and directories by ‘h’iding them. With these access controls, if the root user in the root role does ls /sys you get

# ls /sys
ls: cannot access /sys: No such file or directory

but if the root user transitions to the admin role using gradm -a admin, then you get

# ls /sys/
block  bus  class  dev  devices  firmware  fs  kernel  module

Next consider these lines:

	/bin			rx
	/sbin			rx
	...
	/lib32			rx
	/lib64			rx
	/lib64/modules		h

Since the ‘x’ flag is inherited by all the files under those directories, this allows processes like your shell to execute, for example, /bin/ls or /lib64/ld-2.21.so. The ‘r’ flag further allows processes to read the contents of those files, so one could do hexdump /bin/ls or hexdump /lib64/ld-2.21.so. Dropping the ‘r’ flag on /bin would stop you from hexdumping the contents, but it would not prevent execution nor would it stop you from listing the contents of /bin. If we wanted to make this subject a bit more secure, we could drop ‘r’ on /bin and not break our system. This, however, is not the case with the library directories. Dropping ‘r’ on them would break the system since library files need to have readable contents for loaded, as well as be executable.

Now consider these lines:

        /dev                    hx
        /dev/log                r
        /dev/urandom            r
        /dev/null               rw
        /dev/tty                rw
        /dev/ptmx               rw
        /dev/pts                rw
        /dev/initctl            rw

The ‘h’ flag will hide /dev and its contents, but the ‘x’ flag will still allow processes to enter into that directory and access /dev/log for reading, /dev/null for reading and writing, etc. The ‘h’ is required to hide the directory and its contents because, as we saw above, ‘x’ is sufficient to allow processes to list the contents of the directory. As written, the above policy yields the following result in the root role

# ls /dev
ls: cannot access /dev: No such file or directory
# ls /dev/tty0
ls: cannot access /dev/tty0: No such file or directory
# ls /dev/log
/dev/log

In the admin role, all those files are visible.

Let’s end our study of this subject by looking at the ‘bind’, ‘connect’ and ‘sock_allow_family’ lines. Note that the addresses/ports include a list of allowed transport protocols from /etc/protocols. One gotcha here is make sure you include port 0 for icmp! The ‘sock_allow_family’ allows all socket families, including unix, inet, inet6 and netlink.

Now that we understand this policy, we can proceed to add isolated restrictions to our mostly permissive root role. Remember that the system is totally restricted for all UNIX users except root, so if you want to allow some ordinary user access, you can simply copy the entire role, including the subject blocks, and just rename ‘role root’ to ‘role myusername’. You’ll probably want to remove the ‘role_transitions’ line since an ordinary user should not be able to transition to the admin role. Now, suppose for whatever reason, you don’t want this user to be able to list any files or directories. You can simply add a line to his ‘/’ subject block which reads ‘/bin/ls h’ and ls become completely unavailable for him! This particular example might not be that useful in practice, but you can use this technique, for example, if you want to restrict access to to your compiler suite. Just ‘h’ all the directories and files that make up your suite and it becomes unavailable.

A more complicated and useful example might be to restrict a user’s listing of a directory to just his home. To do this, we’ll have to add a new subject block for /bin/ls. If your not sure where to start, you can always begin with an extremely restrictive subject block, tack it at the end of the subjects for the role you want to modify, and then progressively relax it until it works. Alternatively, you can do partial learning on this subject as described above. Let’s proceed manually and add the following:

subject /bin/ls o {
{
        /  h
        -CAP_ALL
        connect disabled
        bind    disabled
}

Note that this is identical to the extremely restrictive ‘/’ subject for the default role except that the subject is ‘/bin/ls’ not ‘/’. There is also a subject flag ‘o’ which tells RBAC to override the previous policy for /bin/ls. We have to override it because that policy was too permissive. Now, in one terminal execute gradm -R in the admin role, while in another terminal obtain a denial to ls /home/myusername. Checking our dmesgs we see that:

[33878.550658] grsec: From 192.168.5.2: (root:U:/bin/ls) denied access to hidden file /lib64/ld-2.21.so by /bin/ls[bash:7861] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:7164] uid/euid:0/0 gid/egid:0/0

Well that makes sense. We’ve started afresh denying everything, but /bin/ls requires access to the dynamic linker/loader, so we’ll restore read access to it by adding a line ‘/lib64/ld-2.21.so r’. Repeating our test, we get a seg fault! Obviously, we don’t just need read access to the ld.so, but we also execute privileges. We add ‘x’ and try again. This time the denial is

[34229.335873] grsec: From 192.168.5.2: (root:U:/bin/ls) denied access to hidden file /etc/ld.so.cache by /bin/ls[ls:7917] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:7909] uid/euid:0/0 gid/egid:0/0
[34229.335923] grsec: From 192.168.5.2: (root:U:/bin/ls) denied access to hidden file /lib64/libacl.so.1.1.0 by /bin/ls[ls:7917] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:7909] uid/euid:0/0 gid/egid:0/0

Of course! We need ‘rx’ for all the libraries that /bin/ls links against, as well as the linker cache file. So we add lines for libc, libattr and libacl and ls.so.cache. Our final denial is

[34481.933845] grsec: From 192.168.5.2: (root:U:/bin/ls) denied access to hidden file /home/myusername by /bin/ls[ls:7982] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:7909] uid/euid:0/0 gid/egid:0/0

All we need now is ‘/home/myusername r’ and we’re done! Our final subject block looks like this:

subject /bin/ls o {
        /                         h
        /home/myusername          r
        /etc/ld.so.cache          r
        /lib64/ld-2.21.so         rx
        /lib64/libc-2.21.so       rx
        /lib64/libacl.so.1.1.0    rx
        /lib64/libattr.so.1.1.0   rx
        -CAP_ALL
        connect disabled
        bind    disabled
}

Proceeding in this fashion, we can add isolated restrictions to our mostly permissive policy.

References:

The official documentation is The_RBAC_System.  A good reference for the role, subject and object flags can be found in these  Tables.

Tor-ramdisk 20151215 released: libressl to the rescue!

If you’ve read some of my previous posts, you know that I’ve been maintaining this hardened-Gentoo derived, uClibc-based, micro Linux distribution called “tor-ramdisk”.   Its a small image, less than 7 MB in size, whose only purpose is to host a Tor relay or exit node in a ramdisk environment which vanishes when the system is shut down, leaving no traces behind.  My student Melissa Carlson and I started the project in 2008 and I’ve been pushing out an update with every major release of Tor.  Over the years, I automated the build system and put the scripts on a git repository at gitweb.torproject.org that Roger Dingledine gave me.  If you want to build your own image, all you need do is run the scripts in the chroot of a hardened amd64 or i686 uClibc stage3 image which you can get off the mirrors.

Recently, upstream switched the tor-0.2.7.x branch to depend on openssl’s elliptic curves code.  Unfortunately, this code is patented and can’t be distribute in tor-ramdisk.  If you read the tor ebuilds, you’ll see that the 0.2.7.x version has to DEPEND on dev-libs/openssl[-bindist] while the 0.2.6 only depends on dev-libs/openssl.  Luckily, there’s libressl which aims to be a free drop in replacement for openssl.  Its in the tree now and almost ready to go — I say almost because we’re still in the transition phase.  Libressl in Gentoo has been primarily hasufel‘s project, but I’ve been helping out here and there.  I got a few upstream commits to make sure it works in both uClibc and musl as well as with our hardened compiler.

So, this is the first tor-ramdisk release with libressl.  Of course I tested both the amd64 and i686 images and they work as expected, but libressl is new-ish so I’m still not sure what to expect.  Hopefully we’ll get better security than we did from openssl, but I’ve got my eyes out for any CVE’s.  At least we’re getting patent free code.

Tor-ramdisk has been one of those mature projects where all I have to do is tweak a few version numbers in the scripts, press go, and out pops a new image.  But I do have some plans for improvements: 1) I need to clean up the text user interface a bit.  Its menu driven and I need to consolidate some of the menu items, like the ‘resource’ menu and the ‘entropy’ menus.  2) I want to enable IPv6 support.  For the longest time Tor was IPv4 only but for a couple of years now, it has supported IPv6.  3) Tor can run in one of several modes.  Its ideal as a relay or exit node, but can be configured as client or hidden service for processes running on a different box.  However, tor-ramdisk can’t be used as a bridge.  I want to see if I can add bridge support.  4) Finally, I’m thinking of switching from uClibc to musl and building static PIE executables.  This would make a tighter image than the current one.

If you’re interested in running tor-ramdisk, here are some links:

i686:
Homepage: http://opensource.dyc.edu/tor-ramdisk
Download: http://opensource.dyc.edu/tor-ramdisk-downloads
ChangeLog: http://opensource.dyc.edu/tor-ramdisk-changelog

x86_64:
Homepage: http://opensource.dyc.edu/tor-x86_64-ramdisk
Download: http://opensource.dyc.edu/tor-x86_64-ramdisk-downloads
ChangeLog: same as i686.

alt-libc: The state of uClibc and musl in Gentoo (part 1)

About five years ago, I became interested in alternative C standard libraries like uClibc and musl and their application to more than just embedded systems.  Diving into the implementation details of those old familiar C functions can be tedious, but also challenging especially under constraints of size, performance, resource consumption, features and correctness.  Yet these details can make a big difference as you can see from this comparison of glibc, uClibc, dietlibc and musl.  I first encountered libc performance issues when I was doing number crunching work for my ph.d. in physics at Cornell  in the late 80’s.  I was using IBM RS6000’s (yes!) and had to jump into the assembly.  It was lots of fun and I’ve loved this sort of low level stuff ever since.

Over the past four years, I’ve been working towards producing stage3 tarballs for both uClibc and musl systems on various arches, and with help from generous contributors (thanks guys!), we now have a pretty good selection on the mirrors.  These stages are not strictly speaking embedded in that they do not make use of busybox to provide their base system.  Rather, they employ the same packages as our glibc stages and use coreutils, util-linux, net-tools and friends.  Except for small details here and there, they only differ from our regular stages in the libc they use.

If you read my last blog posting on this new release engineering tool I’m developing called GRS, you’ll know that I recently hit a milestone in this work.  I just released three hardened, fully featured XFCE4 desktop systems for amd64.  These systems are identical to each other except for their libc, again modulo a few details here and there.  I affectionately dubbed these Bluemoon for glibc, Lilblue for uClibc, and Bluedragon for musl.  (If you’re curious about the names, read their homepages.)  You can grab all three off my dev space , or off the mirrors under experimental/amd64/{musl,uclibc} if you’re looking for just Lilblue or Bluedragon — the glibc system is too boring to merit mirror space.  I’ve been maintaining Lilblue for a couple of years now, but with GRS, I can easily maintain all three and its nice to have them for comparison.

If you play with these systems, don’t expect to be blown away by some amazing differences.  They are there and they are noticeable, but they are also subtle.  For example, you’re not going to notice code correctness in, say, pthread_cancel() unless you’re coding some application and expect certain behavior but don’t get it because of some bad code in your libc.  Rather,  the idea here is push the limits of uClibc and musl to see what breaks and then fix it, at least on amd64 for now.  Each system includes about 875 packages in the release tarballs, and an extra 5000 or so binpkgs built using GRS.  This leads to lots of breakage which I can isolate and address.  Often the problem is in the package itself, but occasionally it’s the libc and that’s where the fun begins!  I’ve asked Patrick Lauer for some space where I can set up my GRS builds and serve out the binpkgs.  Hopefully he’ll be able to set me up with something.  I’ll also be able to make the build.log’s available for packages that fail via the web, so that GRS will double as a poor man’s tinderbox.

In a future article I’ll discuss musl, but in the remainder of this post, I want to highlight some big ticket items we’ve hit in uClibc.  I’ve spent a lot of time building up machinery to maintain the stages and desktops, so now I want to focus my attention on fixing the libc problems.  The following laundry list is as much a TODO for me as it is for your entertainment.  I won’t blame you if you want to skip it.  The selection comes from Gentoo’s bugzilla and I have yet to compare it to upstream’s bugs since I’m sure there’s some overlap.

Currently there are thirteen uClibc stage3’s being supported:

  • stage3-{amd64,i686}-uclibc-{hardened,vanilla}
  • stage3-armv7a_{softfp,hardfp}-uclibc-{hardened,vanilla}
  • stage3-mipsel3-uclibc-{hardened,vanilla}
  • stage3-mips32r2-uclibc-{hardened,vanilla}
  • stage3-ppc-uclibc-vanilla

Here hardened and vanilla refer to the toolchain hardening as we do for regular glibc systems.  Some bugs are arch specific some are common.  Let’s look at each in turn.

* amd64 and i686 are the only stages considered stable and are distributed along side our regular stage3 releases in the mirrors.  However, back in May I hit a rather serious bug (#548950) in amd64 which is our only 64-bit arch.  The problem was in the the implementation of pread64() and pwrite64() and was triggered by a change in the fsck code with e2fsprogs-1.42.12.  The bug led to data corruption of ext3/ext4 filesystem which is a very serious issue for a release advertised as stable.  The problem was that the wrong _syscall wrapper was being used for amd64.  If we don’t require 64-bit alignment, and you don’t on a 64-bit arch (see uClibc/libc/sysdeps/linux/x86_64/bits/uClibc_arch_features.h), then you need to use _syscall4, not _syscall6.

The issue was actually fixed by Mike Frysinger (vapier) in uClibc’s git HEAD but not in the 0.9.33 branch which is the basis of the stages.  Unfortunately, there hasn’t been a new release of uClibc in over three year so backporting meant disentangling the fix from some new thread cancel stuff and was annoying.

* The armv7a stages are close to being stable, but they are still being distributed in the mirrors under experimental.  The problem here is not uClibc specific, but due to hardened gcc-4.8 and it affects all our hardened arm stages.  With gcc-4.8, we’re turning on -fstack-check=yes by default in the specs and this breaks alloca().  The workaround for now is to use gcc-4.7, but we should turn off -fstack-check for arm until bug #518598 – (PR65958) is fixed.

* I hit this one weird bug when building the mips stages, bug #544756.  An illegal instruction is encountered when building any version of python using gcc-4.9 with -O1 optimization or above, yet it succeeds with -O0.  What I suspect happened here is some change in the optimization code for mips between gcc-4.8 and 4.9 introduced the problem.  I need to distill out some reduced code before I submit to gcc upstream.   For now I’ve p.masked >=gcc-4.9 in default/linux/uclibc/mips.  Since mips lacks stable keywords, this actually brings the mips stages in better line with the other stages that use gcc-4.8 (or 4.7 in the case of hardened arm).

* Unfortunately, ppc is plagued with bug #517160PIE code is broken on ppc and causes a seg fault in plt_pic32.__uClibc_main ().  Since PIE is an integral part of how we do hardening in Gentoo, there’ s no hardened ppc-uclibc stage.  Luckily, there are no known issues with the vanilla ppc stage3.

Finally, there are five interesting bugs which are common to all arches.  These problems lie in the implementation of functions in uClibc and deviate from the expected behavior.  I’m particularly grateful to who found them by running the test suite for various packages.

* Bug 527954 – One of wget’s tests makes use of fnmatch(3) which intelligently matches file names or paths.  On uClibc, there is an unexpected failure in a test where it should return a non-match when fnmatch’s options contains FNM_PATHNAME and a matching slash is not present in both strings.  Apparently this is a duplicate of a really old bug (#181275).

* Bug 544118René noticed this problem in an e2fsprogs test for libss.so.  The failure here is due to the fact that setbuf(3) is ineffective at changing the buffer size of stdout if it is run after some printf(1).  Output to stdout is buffered while output to stderr is not.  This particular test tries to preserve the order of output from a sequence of writes to stdout and stderr by setting the buffer size of stdout to zero.  But setbuf() only works on uClibc if it is invoked before any output to stdout.  As soon as there is even one printf(), all invocations to setbuf(stdout, …) are ineffecitve!

* Bug 543972 – This one came up in the gzip test suite.  One of the tests there checks to make sure that gzip properly fails if it runs out of disk space.  It uses /dev/full, which is a pseudo-device provided by the kernel that pretends to always be full.  The expectation is that fclose() should set errno = ENOSPC when closing /dev/full.  It does on glibc but it doesn’t in uClibc.  It actually happens when piping stdout to /dev/full, so the problem may even be in dup(2).

* Bug 543668 – There is some problem in uClibc’s dopen()/dlclose() code.  I wrote about this in a previous blog post and also hit it with syslog-ng’s plugin system.  A seg fault occurs when unmapping a plugin from the memory space of a process during do_close() in uClibc/ldso/libdl/libdl.c.  My guess is that the problem lies in uClibc’s accounting of the mappings and when it tries to unmap an area of memory which is not mapped or was previously unmapped, it seg faults.

The “Gentoo Reference System” suite: a new Release Engineering tool.

The other day I installed Ubuntu 15.04 on one of my boxes.  I just needed something where I could throw in a DVD, hit install and be done.  I didn’t care about customization or choice, I just needed a working Linux system from which I could chroot work.  Thousands of people around the world install Ubuntu this way and when they’re done, they have a stock system like any other Ubuntu installation, all identical like frames in a Andy Warhol lithograph.   Replication as a form of art.

In contrast, when I install a Gentoo system, I enjoy the anxiety of choice.  Should I use syslog-ng, metalog, or skip a system logger altogether?  If I choose syslog-ng, then I have a choice of 14 USE flags for 2^14 possible configurations for just that package.  And that’s just one of some 850+ packages that are going to make up my desktop.  In contrast to Ubuntu where every installation is identical (whatever “idem” means in this context), the shear space of possibilities make no two Gentoo systems the same unless there is some concerted effort to make them so.  In fact, Gentoo doesn’t even have a notion of a “stock” system unless you count the stage3s which are really bare bones.  There is no “stock” Gentoo desktop.

With the work I am doing with uClibc and musl, I needed a release tool that would build identical desktops repeatedly and predictably where all the choices of packages and USE flags were layed out a priori in some specifications.  I considered catalyst stage4, but catalyst didn’t provide the flexibility I wanted.  I initially wrote some bash scripts to build an XFCE4 desktop from uClibc stage3 tarballs (what I dubbed “Lilblue Linux“), but this was very much ad hoc code and I needed something that could be generalized so I could do the same for a musl-based desktop, or indeed any Gentoo system I could dream up.

This led me to formulate the notion of what I call a “Gentoo Reference System” or GRS for short — maybe we could make stock Gentoo systems available.  The idea here is that one should be able to define some specs for a particular Gentoo system that will unambiguously define all the choices that go into building that system.  Then all instances built according to those particular GRS specs would be identical in much the same way that all Ubuntu systems are the same.  In a Warholian turn, the artistic choices in designing the system would be pushed back into the specs and become part of the automation.  You draw one frame of the lithograph and you magically have a million.

The idea of these systems being “references” was also important for my work because, with uClibc or musl, there’s a lot of package breakages — remember you pushing up against actual implementations of C functions and nearly everything in your systems written in C.  So, in the space of all possible Gentoo systems, I needed some reference points that worked.  I needed that magical combinations of flags and packages that would build and yield useful systems.  It was also important that these references be easily kept working over time since Gentoo systems evolve as the main tree, or overlays, are modified.  Since on some successive build something might break, I needed to quickly identify the delta and address it.  The metaphor that came up in my head from my physics background is that of phase space.  In the swirling mass of evolving dynamical systems, I pictured these “Gentoo Reference Systems” as markers etching out a well defined path over time.

Enough with the metaphors, how does GRS work?  There are two main utilities, grsrun and grsup.  The first is run on a build machine and generates the GRS release as well as any extra packages and updates.  These are delivered as binpkgs.  In contrast, grsup is run on an installed GRS instance and its used for package management.  Since we’re working in a world of identical systems, grsup prefers working with binpkgs that are downloaded from some build machine, but it can revert to building locally as well.

The GRS specs for some system are found on a branch of a git repository.  Currently the repo at https://gitweb.gentoo.org/proj/grs.git/ has four branches, each for one of the four GRS specs housed there.  grsrun is then directed to sync the remote repo locally, check out the branch of the GRS system we want to build and begin reading a script file called build which directs grsrun on what steps to take.  The scripting language is very simple and contains only a handful of different directives.  After a stage tarball is unpacked, build can direct grsrun to do any of the following:

mount and umount – Do a bind mount of /dev/, /dev/pts/ and other directories that are required to get a chroot ready.

populate – Selectively copy files from the local repo to the chroot.  Any files can be copied in, so, for example, you can prepare a pristine home directory for some user with a pre-configured desktop.  Or you can add customized configuration files to /etc for services you plan to run.

runscript – This will run some bash or python script in the chroots.  The scripts are copied from the local repo to /tmp of the chroot and executed there.  These scripts can be like the ones that catalyst runs during stage1/2/3 but can also be scripts to add users and groups, to add services to runlevels, etc.  Think of anything you would do when growing a stage3 into the system you want, script it up and GRS will automated it for you.

kernel – This looks for a kernel config file in the local repo, parses it for the version, builds it and both bundles it as a packages called linux-image-<version>.tar.xz for later distribution as well as installs it into the chroot.  grsup knows how to work with these linux-image-<version>.tar.xz files and can treat them like binpkgs.

tarit and hashit – These directives create a release tarball of the entire chroot and generate the digests.

pivot – If you built a chroot within a chroot, like catalyst does during stage1, then this pivots the inner chroot out so that further building can make use of it.

From an implementation point of view, the GRS suite is written in python and each of the above directives is backed by a simple python class.  Its easy, for instance, to implement more directives this way.  E.g. if you want to build a bootable CD image, you can include a directive called isoit, write a python class for what’s required to construct the iso image and glue this new class into the grs module.

If you’re familiar with catalyst, at this point you might be wondering what’s the difference?  Can’t you do all of this with catalyst?  There is a lot of overlap, but the emphasis is different.  For example, I wanted to be able to drop in a pre-configured desktop for a user.  How would I do that with catalyst?  I guess I could create an overlay with packages for some pre-built home directory but that’s a perversion of what ebuilds are for — we should never be installing into /home.  Rather with grsrun I can just populate the chroot with whatever files I like anywhere in the filesystem.  More importantly, I want to be able control what USE flags are set and, in general, manage all of /etc/portage/catalyst does provide portage_configdir which populates /etc/portage when building stages, but its pretty static.  Instead, grsup and two other utilities, install-worldconf and clean-worldconf, can dynamically manage files under /etc/portage/ according to a configuration file called world.conf.

Lapsing back into metaphor, I see catalyst as rigid and frozen whereas grsrun is loose and fluid.  You can use grsrun to build stage1/2/3 tarballs which are identical to those built with catalyst, and in fact I’ve done so for hardened amd64 mutlilib stages so I could compare.  But with grsrun you have too much freedom in writing the scripts and file that go into the GRS specs and chances are you’ll get something wrong, whereas with catalyst the build is pretty regimented and you’re guaranteed to get uniformity across arches and profiles.  So while you can do the same things with each tool, its not recommended that you use grsrun to do catalyst stage builds — there’s too much freedom.  Whereas when building desktops or servers you might welcome that freedom.

Finally, let me close with how grsup works.  As mentioned above, the GRS specs for some system include a file called world.conf.  Its in configparser format and it specifies files and their contents in the /etc/portage/ directory.  An example section in the file looks like:

[app-crypt/gpgme:1]
package.use : app-crypt/gpgme:1 -common-lisp static-libs
package.env : app-crypt/gpgme:1 app-crypt_gpgme_1
env : LDFLAGS=-largp

This says, for package app-crypt/gpgme:1, drop a file called app-crypt_gpgme_1 in /etc/portage/package.use/ that contains the line “app-crypt/gpgme:1 -common-lisp static-libs”, drop another file by the same name in /etc/portage/package.env/ with line “app-crypt/gpgme:1 app-crypt_gpgme_1”, and finally drop a third file by the same name in /etc/portage/env/ with line “LDFLAGS=-largp”.   grsup is basically a wrapper to emerge which first populates /etc/portage/ according to the world.conf file, then emerges the requested pkg(s) preferring the use of binpkgs over building locally as stated above, and finally does a clean up on /etc/portage/install-worldconf and clean-worldconf isolate the populate and clean up steps so they can be used in scripts run by grsrun when building the release.  To be clear, you don’t have to use grsup to maintain a GRS system.  You can maintain it just like any other Gentoo system, but if you manage your own /etc/portage/, then you are no longer tracking the GRS specs.  grsup is meant to make sure you update, install or remove packages in a manner that keeps the local installation in compliance with the GRS specs for that system.

All this is pretty alpha stuff, so I’d appreciate comments on design and implementation before things begin to solidify.  I am using GRS to build three desktop systems which I’ll blog about next.  I’ve dubbed these systems Lilblue which is a hardened amd64 XFCE4 desktop with uClibc as its standard libc, Bluedragon that uses musl, and finally Bluemoon which uses good old glibc.  (Lilblue is actually a few years old, but the latest release is the first built using GRS.)  All three desktops are identical with respect to the choice of packages and USE flags, and differ only in their libc’s so one can compare the three.  Lilbue and Bluedragon are on the mirrors, or you can get all three from my dev space at http://dev.gentoo.org/~blueness/theblues/.  I didn’t push out bluemoon on the mirrors because a glibc based desktop is nothing special.  But since building with GRS is as simple as cloning a git branch and tweaking, and since the comparison is useful, why not?

The GRS home page is at https://wiki.gentoo.org/wiki/Project:RelEng_GRS.

The C++11 ABI incompatibility problem in Gentoo

Gentoo allows users to have multiple versions of gcc installed and we (mostly?) support systems where userland is partially build with different versions.  There are both advantages and disadvantages to this and in this post, I’m going to talk about one of the disadvantages, the C++11 ABI incompatibility problem.  I don’t exactly have a solution, but at least we can define what the problem is and track it [1].

First what is C++11?  Its a new standard of C++ which is just now making its way through GCC and clang as experimental.  The current default standard is C++98 which you can verify by just reading the defined value of __cplusplus using the preprocessor.

$  g++ -x c++ -E -P - <<< __cplusplus
199711L
$  g++ -x c++ --std=c++98 -E -P - <<< __cplusplus
199711L
$  g++ -x c++ --std=c++11 -E -P - <<< __cplusplus
201103L

This shouldn’t be surprising, even good old C has standards:

$ gcc -x c -std=c90 -E -P - <<< __STDC_VERSION__
__STDC_VERSION__
$ gcc -x c -std=c99 -E -P - <<< __STDC_VERSION__
199901L
$ gcc -x c -std=c11 -E -P - <<< __STDC_VERSION__
201112L

We’ll leave the interpretation of these values as an exercise to the reader.  [2]

The specs for these different standards at least allow for different syntax and semantics in the language.  So here’s an example of how C++98 and C++11 differ in this respect:

// I build with both --std=c++98 and --std=c++11
#include <iostream>
using namespace std;
int main() {
    int i, a[] = { 5, -3, 2, 7, 0 };
    for (i = 0; i < sizeof(a)/sizeof(int); i++)
        cout << a[i] << endl ;
    return 0;
}
// I build with only --std=c++11
#include <iostream>
using namespace std;
int main() {
    int a[] = { 5, -3, 2, 7, 0 };
    for (auto& x : a)
        cout << x << endl ;
    return 0;
}

I think most people would agree that the C++11 way of iterating over arrays (or other objects like vectors) is sexy.  In fact C++11 is filled with sexy syntax, especially when it come to its threading and atomics, and so coders are seduced.  This is an upstream choice and it should be reflected in their build system with –std= sprinkled where needed.  I hope you see why you should never add –std= to your CFLAGS or CXXFLAGS.

The syntactic/semantic differences is the first “incompatiblity” and it is really not our problem downstream.  Our problem in Gentoo comes because of ABI incompatibilities between the two standards arrising from two sources: 1) Linking between objects compiled with –std=c++98 and –std=c++11 is not guaranteed to work.  2) Neither is linking between objects both compiled with –std=c+11 but with different versions of GCC differing in their minior release number.  (The minor release number is x in gcc-4.x.y.)

To see this problem in action, let’s consider the following little snippet of code which uses a C++11 only function [3]

#include <chrono>
using namespace std;
int main() {
    auto x = chrono::steady_clock::now;
}

Now if we compile that with gcc-4.8.3 and check its symbols we get the following:

$ $ g++ --version
g++ (Gentoo Hardened 4.8.3 p1.1, pie-0.5.9) 4.8.3
$ g++ --std=c++11 -c test.cpp
$ readelf -s test.o
Symbol table '.symtab' contains 12 entries:
Num:    Value          Size Type    Bind   Vis      Ndx Name
  0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
  1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS test.cpp
  2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
  3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
  4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
  5: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
  6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
  7: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
  8: 0000000000000000    78 FUNC    GLOBAL DEFAULT    1 main
  9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
 10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_
 11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __stack_chk_fail

We can now confirm that that symbol is in fact in libstdc++.so for 4.8.3 but NOT for 4.7.3 as follows:

$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 | grep _ZNSt6chrono3_V212steady_
  1904: 00000000000e5698     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
  3524: 00000000000c8b00    89 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6 | grep _ZNSt6chrono3_V212steady_
$

Okay, so we’re just seeing an example of things in flux.  Big deal?  If you finish linking test.cpp and check what it links against you get what you expect:

$ g++ --std=c++11 -o test.gcc48 test.o
$ ./test.gcc48
$ ldd test.gcc48
        linux-vdso.so.1 (0x000002ce333d0000)
        libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 (0x000002ce32e88000)
        libm.so.6 => /lib64/libm.so.6 (0x000002ce32b84000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1 (0x000002ce3296d000)
        libc.so.6 => /lib64/libc.so.6 (0x000002ce325b1000)
        /lib64/ld-linux-x86-64.so.2 (0x000002ce331af000)

Here’s where the wierdness comes in.  Suppose we now switch to gcc-4.7.3 and repeat.  Things don’t quite work as expected:

$ g++ --version
g++ (Gentoo Hardened 4.7.3-r1 p1.4, pie-0.5.5) 4.7.3
$ g++ --std=c++11 -o test.gcc47 test.cpp
$ ldd test.gcc47
        linux-vdso.so.1 (0x000003bec8a9c000)
        libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 (0x000003bec8554000)
        libm.so.6 => /lib64/libm.so.6 (0x000003bec8250000)
        libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libgcc_s.so.1 (0x000003bec8039000)
        libc.so.6 => /lib64/libc.so.6 (0x000003bec7c7d000)
        /lib64/ld-linux-x86-64.so.2 (0x000003bec887b000)

Note that it says its linking against 4.8.3/libstdc++.so.6 and not 4.7.3.  That’s because of the order in which the library paths are search is defined in /etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf and this file is sorted that way it is on purpose.  So maybe it’ll run!  Let’s try:

$ ./test.gcc47
./test.gcc47: relocation error: ./test.gcc47: symbol _ZNSt6chrono12steady_clock3nowEv, version GLIBCXX_3.4.17 not defined in file libstdc++.so.6 with link time reference

Nope, no joy.  So what’s going on?  Let’s look at the symbols in both test.gcc47 and test.gcc48:

$ readelf -s test.gcc47  | grep chrono
  9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono12steady_cloc@GLIBCXX_3.4.17 (4)
 50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono12steady_cloc
$ readelf -s test.gcc48  | grep chrono
  9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_@GLIBCXX_3.4.19 (4)
 49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt6chrono3_V212steady_

Whoah!  The symbol wasn’t mangled the same way!  Looking more carefully at *all* the chrono symbols in 4.8.3/libstdc++.so.6 and 4.7.3/libstdc++.so.6 we see the problem.

$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6 | grep chrono
  353: 00000000000e5699     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212system_@@GLIBCXX_3.4.19
 1489: 000000000005e0e0    86 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1605: 00000000000e1a3f     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1904: 00000000000e5698     1 OBJECT  GLOBAL DEFAULT   13 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
 2102: 00000000000c8aa0    86 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212system_@@GLIBCXX_3.4.19
 3524: 00000000000c8b00    89 FUNC    GLOBAL DEFAULT   11 _ZNSt6chrono3_V212steady_@@GLIBCXX_3.4.19
$ readelf -s /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6 | grep chrono
 1478: 00000000000c6260    72 FUNC    GLOBAL DEFAULT   12 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 1593: 00000000000dd9df     1 OBJECT  GLOBAL DEFAULT   14 _ZNSt6chrono12system_cloc@@GLIBCXX_3.4.11
 2402: 00000000000c62b0    75 FUNC    GLOBAL DEFAULT   12 _ZNSt6chrono12steady_cloc@@GLIBCXX_3.4.17

Only 4.7.3/libstdc++.so.6 has _ZNSt6chrono12steady_cloc@@GLIBCXX_3.4.17.  Normally when libraries change their exported symbols, they change their SONAME, but this is not the case here, as running `readelf -d` on both shows.  GCC doesn’t bump the SONAME that way for reasons explained in [4].  Great, so just switch around the order of path search in /etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf.  Then we get the problem the other way around:

$ ./test.gcc47
$ ./test.gcc48
./test.gcc48: /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/libstdc++.so.6: version `GLIBCXX_3.4.19' not found (required by ./test.gcc48)

So no problem if your system has only gcc-4.7.  No problem if it has only 4.8.  But if it has both, then compiling C++11 with 4.7 and linking against libstdc++ for 4.8 (or vice versa) and you get breakage at the binary level.  This is the C++11 ABI incompatibility problem in Gentoo.  As an exercise for the reader, fix!

Ref.

[1] Bug 542482 – (c++11-abi) [TRACKER] c++11 abi incompatibility

[2] This is an old professor’s trick for saying, hey go find out why c90 doesn’t define a value for __STDC_VERSION__ and let me know, ‘cuz I sure as hell don’t!

[3] This example was inspired by bug #513386.  You can verify that it requires –std=c++11 by dropping the flag and getting yelled at by the compiler.

[4] Upstream explains why in comment #5 of GCC bug #61758.  The entire bug is dedicated to this issue.

Lilblue Linux: release 20141212. dlclose() is a problem.

I pushed out another version of Lilblue Linux a few days ago but I don’t feel as good about this release as previous ones.  If you haven’t been following my posts, Lilblue is a fully featured amd64, hardened, XFCE4 desktop that uses uClibc instead of glibc as its standard C library.  The name is a bit misleading because Lilblue is Gentoo but departs from the mainstream in this one respect only.  In fact, I strive to make it as close to mainstream Gentoo as possible so that everything will “just work”.  I’ve been maintaining Lilblue for years as a way of pushing the limits of uClibc, which is mainly intended for embedded systems, to see where it breaks and fix or improve it.

As with all releases, there are always a few minor problems, little annoyances that are not exactly show stopper.  One minor oversight that I found after releasing was that I hadn’t configured smplayer correctly.  That’s the gui front end to mplayer that you’ll find on the toolbar on the bottom of the desktop. It works, just not out-of-the-box.  In the preferences, you need to switch from mplayer2 to mplayer and set the video out to x11.  I’ll add that to the build scripts to make sure its in the next release [1].  I’ve also been migrating away from gnome-centered applications which have been pulling in more and more bloat.  A couple of releases ago I switched from gnome-terminal to xfce4-terminal, and for this release, I finally made the leap from epiphany to midori as the main browser.  I like midori better although it isn’t as popular as epiphany.  I hope others approve of the choice.

But there is one issue I hit which is serious.  It seems with every release I hit at least one of those.  This time it was in uClibc’s implementation of dlclose().  Along with dlopen() and dlsym(), this is how shared objects can be loaded into a running program during execution rather than at load time.  This is probably more familiar to people as “plugins” which are just shared objects loaded while the program is running.  When building the latest Lilblue image, gnome-base/librsvg segfaulted while running gdk-pixbuf-query-loaders [2].  The later links against glib and calls g_module_open() and g_module_close() on many shared objects as it constructs a cache of of loadable objects.  g_module_{open,close} are just glib’s wrappers to dlopen() and dlclose() on systems that provide them, like Linux.  A preliminary backtrace obtained by running gdb on `/usr/bin/gdk-pixbuf-query-loaders ./libpixbufloader-svg.la` pointed to the segfault happening in gcc’s __deregister_frame_info() in unwind-dw2-fde.c, which didn’t sound right.  I rebuilt the entire system with CFLAGS+=”-fno-omit-frame-pointer -O1 -ggdb” and turned on uClibc’s SUPPORT_LD_DEBUG=y, which emits debugging info to stderr when running with LD_DEBUG=y, and DODEBUG=y which prevents symbol stripping in uClibc’s libraries.  A more complete backtrace gave:

Program received signal SIGSEGV, Segmentation fault.
__deregister_frame_info (begin=0x7ffff22d96e0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c:222
222 /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c: No such file or directory.
(gdb) bt
#0 __deregister_frame_info (begin=0x7ffff22d96e0) at /var/tmp/portage/sys-devel/gcc-4.8.3/work/gcc-4.8.3/libgcc/unwind-dw2-fde.c:222
#1 0x00007ffff22c281e in __do_global_dtors_aux () from /lib/libbz2.so.1
#2 0x0000555555770da0 in ?? ()
#3 0x0000555555770da0 in ?? ()
#4 0x00007fffffffdde0 in ?? ()
#5 0x00007ffff22d8a2f in _fini () from /lib/libbz2.so.1
#6 0x00007fffffffdde0 in ?? ()
#7 0x00007ffff6f8018d in do_dlclose (vhandle=0x7ffff764a420 <__malloc_lock>, need_fini=32767) at ldso/libdl/libdl.c:860
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The problem occurred when running the global destructors in dlclose()-ing libbz2.so.1.  Line 860 of libdl.c has DL_CALL_FUNC_AT_ADDR (dl_elf_fini, tpnt->loadaddr, (int (*)(void))); which is a macro that calls a function at address dl_elf_fini with signature int(*)(void).  If you’re not familiar with ctor’s and dtor’s, these are the global constructors/destructors whose code lives in the .ctor and .dtor sections of an ELF object which you see when doing readelf -S <obj>.  The ctors are run when a library is first linked or opened via dlopen() and similarly the dtors are run when dlclose()-ing.  Here’s some code to demonstrate this:

# Makefile
all: tmp.so test
tmp.o: tmp.c
        gcc -fPIC -c $^
tmp.so: tmp.o
        gcc -shared -Wl,-soname,$@ -o $@ $
test: test-dlopen.c
        gcc -o $@ $^ -ldl
clean:
        rm -f *.so *.o test
// tmp.c
#include <stdio.h>

void my_init() __attribute__ ((constructor));
void my_fini() __attribute__ ((destructor));

void my_init() { printf("Global initialization!\n"); }
void my_fini() { printf("Global cleanup!\n"); }
void doit() { printf("Doing it!\n" ; }
// test-dlopen.c
// This has very bad error handling, sacrificed for readability.
#include <stdio.h>
#include <dlfcn.h>

int main() {
        int (*mydoit)();
        void *handle = NULL;

        handle = dlopen("./tmp.so", RTLD_LAZY);
        mydoit = dlsym(handle, "doit");
        mydoit();
        dlclose(handle);

        return 0;
}

When run, this code gives:

# ./test 
Global initialization!
Doing it!
Global cleanup!

So, my_init() is run on dlopen() and my_fini() is run on dlclose().  Basically, upon dlopen()-ing a shared object as you would a plugin, the library is first mmap()-ed into the process’s address space using the PT_LOAD addresses which you can see with readelf -l <obj>.  Then, one walks through all the global constructors and runs them.  Upon dlclose()-ing the opposite process is done.  One first walks through the global destructors and runs them, and then one munmap()-s the same mappings.

Figuring I wasn’t the only person to see a problem here, I googled and found that Nathan Copa of Alpine Linux hit a similar problem [3] back when Alpine used to use uClibc — it now uses musl.  He identified a problematic commit and I wrote a patch which would retain the new behavior introduced by that commit upon setting an environment variable NEW_START, but would otherwise revert to the old behavior if NEW_START is unset.  I also added some extra diagnostics to LD_DEBUG to better see what was going on.  I’ll add my patch to a comment below, but the gist of it is that it toggles between the old and new way of calculating the size of the munmap()-ings by subtracting an end and start address.  The old behavior used a mapaddr for the start address that is totally wrong and basically causes every munmap()-ing to fail with EINVAL.  This is corrected by the commit as a simple strace -e trace=munmap shows.

My results when running with LD_DEBUG=1 were interesting to say the least.  With the old behavior, the segfault was gone:

# LD_DEBUG=1 /usr/bin//gdk-pixbuf-query-loaders libpixbufloader-svg.la
...
do_dlclose():859: running dtors for library /lib/libbz2.so.1 at 0x7f26bcf39a26
do_dlclose():864: unmapping: /lib/libbz2.so.1
do_dlclose():869: before new start = 0xffffffffffffffff
do_dlclose():877: during new start = (nil), vaddr = (nil), type = 1
do_dlclose():877: during new start = (nil), vaddr = 0x219c90, type = 1
do_dlclose():881: after new start = (nil)
do_dlclose():987: new start = (nil)
do_dlclose():991: old start = 0x7f26bcf22000
do_dlclose():994: dlclose using old start
do_dlclose():998: end = 0x21b000
do_dlclose():1013: removing loaded_modules: /lib/libbz2.so.1
do_dlclose():1031: removing symbol_tables: /lib/libbz2.so.1
...

Of course, all of the munmap()-ings failed.  The dtors were run, but no shared object got unmapped.  When running the code with the correct value of start, I got:

# NEW_START=1 LD_DEBUG=1 /usr/bin//gdk-pixbuf-query-loaders libpixbufloader-svg.la
...
do_dlclose():859: running dtors for library /lib/libbz2.so.1 at 0x7f5df192ba26
Segmentation fault

What’s interesting here is that the segfault occurs at  DL_CALL_FUNC_AT_ADDR which is before the munmap()-ing and so before any affect that the new value of start should have! This seems utterly mysterious until you realize that there is a whole set of dlopens/dlcloses as gdk-pixbuf-query-loader does its job — I counted 40 in all!  This is as far as I’ve gotten narrowing down this mystery, but I suspect some previous munmap()-ing is breaking the the dtors for libbz2.so.1 and when the call is made to that address, its no longer valid leading to the segfault.

Rich Felker,  aka dalias, the developer of musl, made an interesting comment to me in IRC when I told him about this issue.  He said that the unmappings are dangerous and that musl actually doesn’t do them.  For now, I’ve intentionally left the unmappings in uClibc’s dlclose() “broken” in the latest release of Lilblue, so you can’t hit this bug, but for the next release I’m going to look carefully at what glibc and musl do and try to get this fix upstream.  As I said when I started this post, I’m not totally happy with this release because I didn’t nail the issue, I just implemented a workaround.  Any hits would be much appreciated!

[1] The build scripts can be found in the releng repository at git://git.overlays.gentoo.org/proj/releng.git under tools-uclibc/desktop.  The scripts begin with a <a href=”http://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-uclibc-hardened/”>hardened amd64 uclibc stage3</a> tarball and build up the desktop.

[2] The purpose of librsvg and gdk-pixbuf is not essential for the problem with dlclose(), but for completeness We state them here: librsvg is a library for rendering scalable vector graphics and gdk-pixbuf is an image loading library for gtk+.  gdk-pixbuf-query-loaders reads a libtool .la file and generates cache of loadable shared objects to be consumed by gdk-pixbuf.

[3] See  http://lists.uclibc.org/pipermail/uclibc/2012-October/047059.html. He suggested that the following commit was doing evil things: http://git.uclibc.org/uClibc/commit/ldso?h=0.9.33&id=9b42da7d0558884e2a3cc9a8674ccfc752369610

Tor-ramdisk 20141022 released

Following the latest and greatest exploit in openssl, CVE-2014-3566, aka the POODLE issue, the tor team released version 0.2.4.25.  For those of you not familiar, tor is a system of online anonymity which encrypts and bounces your traffic through relays so as to obfuscated the origin.  Back in 2008, I started a uClibc-based micro Linux distribution, called tor-ramdisk, whose only purpose is to host a tor relay in hardened Gentoo environment purely in RAM.

While the POODLE bug is an openssl issue and is resolved by the latest release 1.0.1j, the tor team decided to turn off the affected protocol, SSL v3 or TLS 1.0 or later.  They also fixed tor to avoid a crash when built using openssl 0.9.8zc, 1.0.0o, or 1.0.1j, with the ‘no-ssl3’ configuration option.  These important fixes to two major components of tor-ramdisk waranted a new release.  Take a look at the upstream ChangeLog for more information.

Since I was upgrading stuff, I also upgrade the kernel to vanilla 3.17.1 + Gentoo’s hardened-patches-3.17.1-1.extras.  All the other components remain the same as the previous release.

i686:
Homepage: http://opensource.dyc.edu/tor-ramdisk
Download: http://opensource.dyc.edu/tor-ramdisk-downloads

x86_64:
Homepage: http://opensource.dyc.edu/tor-x86_64-ramdisk
Download:  http://opensource.dyc.edu/tor-x86_64-ramdisk-downloads

 

Lilblue Linux: release 20140925. Adventures beyond the land of POSIX.

It has been four months since my last major build and release of Lilblue Linux, a pet project of mine [1].  The name is a bit pretentious, I admit, since Lilblue is not some other Linux distro.  It is Gentoo, but Gentoo with a twist.  It’s a fully featured amd64, hardened, XFCE4 desktop that uses uClibc instead of glibc as its standard C library.  I use it on some of my workstations at the College and at home, like any other desktop, and I know other people that use it too, but the main reason for its existence is that I wanted to push uClibc to its limits and see where things break.  Back in 2011, I got bored of working with the usual set of embedded packages.  So, while my students where writing their exams in Modern OS, I entertained myself just adding more and more packages to a stage3-amd64-hardened system [2] until I had a decent desktop.  After playing with it on and off, I finally polished it where I thought others might enjoy it too and started pushing out releases.  Recently, I found out that the folks behind uselessd [3] used Lilblue as their testing ground. uselessd is another response to systemd [4], something like eudev [5], which I maintain, so the irony here is too much not to mention!  But that’s another story …

There was only one interesting issue about this release.  Generally I try to keep all releases about the same.  I’m not constantly updating the list of packages in @world.  I did remove pulseaudio this time around because it never did work right and I don’t use it.  I’ll fix it in the future, but not yet!  Instead, I concentrated on a much more interesting problem with a new release of e2fsprogs [6].   The problem started when upstream’s commit 58229aaf removed a broken fallback syscall for fallocate64() on systems where the latter is unavailable [7].  There was nothing wrong with this commit, in fact, it was the correct thing to do.  e4defrag.c used to have the following code:

#ifndef HAVE_FALLOCATE64
#warning Using locally defined fallocate syscall interface.

#ifndef __NR_fallocate
#error Your kernel headers dont define __NR_fallocate
#endif

/*
 * fallocate64() - Manipulate file space.
 *
 * @fd: defrag target file's descriptor.
 * @mode: process flag.
 * @offset: file offset.
 * @len: file size.
 */
static int fallocate64(int fd, int mode, loff_t offset, loff_t len)
{
    return syscall(__NR_fallocate, fd, mode, offset, len);
}
#endif /* ! HAVE_FALLOCATE */

The idea was that, if a configure test for fallocate64() failed because it isn’t available in your libc, but there is a system call for it in the kernel, then e4defrag would just make the syscall via your libc’s indirect syscall() function.  Seems simple enough, except that how system calls are dispatched is architecture and ABI dependant and the above is broken on 32-bit systems [8].  Of course, uClibc didn’t have fallocate() so e4defrag failed to build after that commit.  To my surprise, musl does have fallocate() so this wasn’t a problem there, even though it is a Linux specific function and not in any standard.

My first approach was to patch e2fsprogs to use posix_fallocate() which is supposed to be equivalent to fallocate() when invoked with mode = 0.  e4defrag calls fallocate() in mode = 0, so this seemed like a simple fix.  However, this was not acceptable to Ts’o since he was worried that some libc might implement posix_fallocate() by brute force writing 0’s.  That could be horribly slow for large allocations!  This wasn’t the case for uClibc’s implementation but that didn’t seem to make much difference upstream.  Meh.

Rather than fight e2fsprogs, I sat down and hacked fallocate() into uClibc.  Since both fallocate() and posix_fallocate(), and their LFS counterparts fallocate64() and posix_fallocate64(), make the same syscall, it was sufficient to isolate that in an internal function which both could make use of.  That, plus a test suite, and Bernhard was kind enough to commit it to master [10].  Then a couple of backports, and uClibc’s 0.9.33 branch now has the fix as well.  Because there hasn’t been a release of  uClibc in about two years, I’m using the 0.9.33 branch HEAD for Lilblue, so the problem there was solved — I know its a little problematic, but it was either that or try to juggle dozens of patches.

The only thing that remains is to backport those fixes to vapier’s patchset that he maintains for the uClibc ebuilds.  Since my uClibc stage3’s don’t use the 0.9.33 branch head, but the stable tree ebuilds which use the vanilla 0.9.33.2 release plus Mike’s patchset, upgrading e2fsprogs is blocked for those stages.

This whole process may seem like a real pita, but this is exactly the sort of issues I like uncovering and cleaning up.  So far, the feedback on the latest release is good.  If you want to play with Lilblue and you don’t have a free box, fire up VirtualBox or your emulator of choice and give it a try.  You can download it from the experimental/amd64/uclibc off any mirror [11].

sthttpd: a very tiny and very fast http server with a mature codebase!

Two years ago, I took on the maintenance of thttpd, a web server written by Jef Poskanzer at ACME Labs [1].  The code hadn’t been update in about 10 years and there were dozens of accumulated patches on the Gentoo tree, many of which addressed serious security issues.  I emailed upstream and was told the project was “done” whatever that meant, so I was going to tree clean it.  I expressed my intentions on the upstream mailing list when I got a bunch of “please don’t!” from users.  So rather than maintain a ton of patches, I forked the code, rewrote the build system to use autotools, and applied all the patch.  I dubbed the fork sthttpd.  There was no particular meaning to the “s”.  Maybe “still kicking”?

I put a git repo up on my server [2], got a mail list going [3], and set up bugzilla [4].  There hasn’t been much activity but there was enough because it got noticed by someone who pushed it out in OpenBSD ports [5].

Today, I finally pushed out 2.27.0 after two years.  This release takes care of a couple of new security issues: I fixed the world readable log problem, CVE-2013-0348 [6], and Vitezslav Cizek <vcizek@suse.com>  from OpenSUSE fixed a possible DOS triggered by specially crafted .htpasswd. Bob Tennent added some code to correct headers for .svgz content, and Jean-Philippe Ouellet did some code cleanup.  So it was time.

Web servers are not my style, but its tiny size and speed makes it perfect for embedded systems which are near and dear to my heart.  I also make sure it compiles on *BSD and Linux with glibc, uClibc or musl.  Not bad for a codebase which is over 10 years old!  Kudos to Jef.