Joe Peterson

More on data integrity: Enter Btrfs!

Those who read my previous post know that my number one concern these days is disk data integrity. As disks become bigger and cheaper, their uncorrectable error rates are not improving. That and the other failure points in the chain tempt Murphy to come out from behind the curtain of denial. It’s getting scary (read my last post for my experiences with this).

The current mainstream Linux filesystems do not address the problem: ext3 does no checksumming, and ext4 adds only journal checksums (but not data block checksums). Your data is still vulnerable to silent corruption.

Sun’s interesting ZFS filesystem had seemed like the sole bright spot on the horizon, but the licensing issues that prevent its inclusion in the Linux kernel are disheartening – we cannot just sit and wait for that to change; and we don’t have to!

There is a new game in town that has a lot of promise. It comes from Chris Mason at Oracle, and it is called “Btrfs“. I had noticed this project a few months ago, but I did not entertain the idea of trying it because it seemed to be so far off in the future – a great idea, but not anywhere near usable. Well, either I got the wrong impression, or things have changed. Quite a lot of Btrfs works, and in my limited testing, it works damn well. It seems quite fast too, compared to what I experienced when trying out ZFS-FUSE. Btrfs is in the Linux kernel, you see, which gives it a big performance advantage and allows it more access to the hardware as well.

Recent Googling revealed that not only are people giving this new fs a spin, but there are already assorted ebuilds out there in various overlays (not surprizing, since building Btrfs and its utilities is a snap). I quickly cooked up my own ebuilds (for the kernel module and the utilities), and they are now in Gentoo:

sys-fs/btrfs
sys-fs/btrfs-progs

There are some actual kernel source patches available out there (the ebuild provides a separate module), but at this early stage it is probably more handy to be able to insert Btrfs in any kernel and be able to upgrade it separately. Now, please note that there is a big warning on the Btrfs site (which I also put in the ebuild):

WARNING: Btrfs is under heavy development, and is not suitable for any uses other than benchmarking and review. The Btrfs disk format is not yet finalized.

So yes, it’s experimental, and upgrades can change the disk format (meaning you’ll have to re-format and re-populate). So call me crazy: I created a big partition and copied my 104GB home directory to a Btrfs filesystem, and I’m playing with it now as I type this. I will be keeping very recent backups, of course, but this is really very cool. I figure I need to put myself out there and help by giving it some serious testing.

I won’t go into details about what Btrfs offers in terms of features (just visit the site), but it appears to be aimed directly at ZFS’s feature set while better fitting into the “Linux way” of doing disk management. The big thing for me is the data checksumming – not having this feels a little like flying blind with no safety net. I am really excited that something is being done to address my number one issue. Go Btrfs!

Linux needs ZFS – and badly!

[before you flame me, I know that Linux (Gentoo, in fact) has zfs-fuze, but it is still pretty experimental, and it runs in user space, which makes it noticeably slower]

ZFS is Sun’s very cool filesystem. I won’t go into detail here – just google it – but it has some eye-opening features, the most critical of which is end-to-end data integrity. Unfortunately, ZFS’s license is incompatible with the GPL.

I say “critical” because I have a strong feeling that silent data corruption is far more prevalent than most people believe. Also, I just don’t buy the argument that bit-for-bit reliability is only important for servers. Yes, in certain circumstances, a bit flip here or there may not be noticed, but I think that is scary as hell. Personally, I’d rather know; I count on computers to copy the bits exactly, don’t you? We simply cannot tolerate random bit errors, no matter how “unnoticeable”. And you will notice if that bit flip hits a critical part of your file.

With disk drives becoming larger and larger and the marketing departments of drive manufacturers knowing that the general public doesn’t understand these issues, they tend to boast speed and size over reliability. We will soon be in real trouble. For an upcoming space mission I’ll be working on at my job, we may have to buy petabytes of storage. With this much, the current hard drive uncorrectable error rates will cause multiple errors per day, letting the data potentially bit rot with current modern filesystems. And just as bad, swap space is also susceptible. So even if you have ECC memory (and I recommend it highly), if your data ends up in swap, you are vulnerable.

In my experience with computers, I have caught two examples of silent data corruption. These are ones I actually discovered. It freaks me out to think there may be many more that went unnoticed. And both were due to bad IDE cables (so even the hard disk error rates don’t count here) on two different computer systems. The first on the old and slow PATA and was some data pattern dependent copy glitch, where a diff found the problem. The other was this past year on a modern UDMA/80-conductor cable, and it was found by ZFS – it appears that during some reported DMA errors (probably the cable’s fault), a 64K file block got written to the wrong spot on the disk (PATA does not protect the data address part of the communication).

ZFS is the only filesystem that actually will catch silent corruption in the whole chain: ATA interface -> cable -> disk (HW and firmware). For those who say, “Why not RAID?”, well, RAID will save you if a whole drive fails, but not these more insidious issues. I bet Linus and others are seriously thinking about what to do, since what once was considered rare could become commonplace. There are rumors Apple will adopt ZFS, and FreeBSD already has it in its kernel (and, of course, Solaris has it). For now, zfs-fuse is very interesting, but I think we need such protection of our data in the kernel, and soon.

Soon there’ll be no more pining for ^C!

For those of you who read my previous blog entry on this subject, you’ll be happy to know that you won’t have to wait much longer for a better Linux terminal experience. That’s right, my patches (sorry, more than a one-line change) to the Linux kernel were merged today by Linus, bringing “Real ctrl-C echo” ™ to a console (or xterm!) near you.

Now, when you interrupt a program, you’ll be able to revel in that confirmation that YOU interrupted the process:

Yay! And it doesn’t stop there, you’ll see ^Z too (or whatever ctrl character you may have assigned to “susp” in stty).

Beyond that, “stty ixany” was basically broken, eating the character that resumed the output as well as not honoring ^C and the unusual ^\ (“quit”). You’ll soon enjoy a more compliant (i.e. in-line with other Unixes) behavior here too.

I’m not sure which kernel version will bring these new and exciting capabilities, but rest assured, you will see these revolutionary enhancements soon.

Getting stuck in the mud

I bet you can relate. You find a nasty bug, but every avenue you try leaves your tires spinning in mud. Google searches are mostly fruitless (or there just isn’t a way to formulate a good query), and no one else seems to be able to reproduce the problem. You are alone. Tracebacks leave you thinking, “Oh…NO!” And you finally realize that the best plan is to get some sleep, start fresh in the morning, and use that time in the shower to come up with the next great plan of attack.

Just when you wonder if you will ever make headway, that little breakthrough happens, and it changes the nature of the problem. The new puzzle might be even harder than the first, but at least it’s different now. Your tires can finally grab some of that mud, returning hope and bringing new energy to the problem.

This happened to me recently on my Gentoo/FreeBSD system. It started when I was was running a script I wrote to create slide shows on my web site. I needed to use the “mogrify” command from ImageMagick (a wonderful package for those of us who do work with images). What has always been a trusty utility threw me a curve ball: SEGFAULT – crap! It was the worst kind of bug – down deep in an OS threading function. Ug, I had a bad feeling about this one.

For a little background, FreeBSD is transitioning to a more efficient threading library called “libthr”. A couple of the old legacy libs, libpthread and libc_r, are now mapped to this new one via a mechanism controled by “/etc/libmap.conf”. This file lets you “drop in” replacement shared libs and tell all programs to start using them instead.

On the gentoo-bsd IRC channel, UberLord suggested that I try turning off the libmap.conf mapping to libthr. Brilliant! The problem went away, which was a big clue. My first inkling was that either there was a bug in libthr (yikes!) or a problem in mogrify’s use of threading. So I started down the dreaded debugging path, instrumenting the libthr and mogrify code with printf statements, using gdb, etc. I finally determined that after many calls to pthread_mutex_lock and pthread_mutex_unlock, the CPU register containing the “current thread” (%%gs on i386) suddenly changed. I assumed that mogrify was creating a new thread but perhaps not initializing it, but no such luck; mogrify does not normally use multiple threads (I verified with the authors). Something was stomping on that register, and it was not random: the new thread address was 0x100 larger than the original one, typically, hinting at interference from other code that also used %%gs.

So here I was. Stuck. Trying to imagine new ways to instrument libthr, wondering if I would ever get to the bottom of this. No one else seemed to see the problem (UberLord later did reproduce the bug), and even the FreeBSD thread developer I emailed a couple of times did not provide that “magic idea” (like, “Oh yeah, I know what’s going on!” – don’t we all hope to hear this?). I even traded executables and libs with others who didn’t see the problem, but this uncovered nothing new (yeah, I figured). It’s no fun trying things that you almost know won’t help but have to be tried anyway – it really is like spinning your tires.

Then it happened. I did another grep through the FreeBSD code to see what else could possibly modify %%gs. Well, libpthread does, of course (the older lib usually used for threading), but that’s mapped to libthr by libmap.conf, right? All this time I had assumed libpthread was “locked out” and not used. But oh hell, I’ve seen many other “but that can’t happen” bugs in my life, so I moved aside libpthread.so, completely removing its involvement. With great anticipation, I hit enter to run mogrify again – no segfault! YES! The nature of the problem had now changed, and my tires were getting traction again. With a new level of energy, I stared my search for why this would be happening. According to a reply to my inquiry on the freebsd-threads mail list, there have been cases in which some symbols from an old library are picked up even when libmap.conf is set up to prevent this.

So part of this is still under investigation (i.e. why doesn’t libmap.conf provide a water-tight mapping?), but one thing is clear: libthr itself is not the problem. The big troublemaker is the mixing of threading libs. In fact, symlinking the legacy threading libs (libpthread and libc_r, .a and .so) to libthr has proven to be a stable solution during my testing.

One thing experience has taught me is that you should never give up. Even when those tires are really spinning and getting no where, all you need is a new angle on the problem, and the little ideas that spark change often come at unexpected times.

Pining for ^C

For those of us with a long Unix/Linux history, one of the most cherished, useful, and powerful key combinations is Ctrl-C. Doesn’t it just give you a satisfying feeling to kill that process that is running amok, spitting loads of misguided output to your xterm? Feels good, doesn’t it? Sometimes holding the control key and hitting “C” is preceded by an utterance of “Crap!” or some other expression of high emotion, and when that prompt comes back, as if saying, “You rang?”, I know I am totally in control, “C” that is.

OK, this article is not all about sending SIGINTs… That would be pretty boring. It’s about something I miss terribly: actually seeing ^C just before the process is killed. There’s something beautiful and informative about the letter C preceded by our beloved caret. You might say, “Why? Don’t you realize you hit Ctrl-C? Why do you care to see it?” I’ll tell you…

One example is when I try running something several times, interrupting it some of those times, and I forget whether the process naturally (or unnaturally) just quit or if I did hit Ctrl-C (it’s such a reflex that it’s often pretty subconscious). In Linux, I often find myself looking for an extra blank line (which is the only feeble trace of the Ctrl-C), but that’s just not a very positive indication! I don’t know about you, but I want to know for sure. Some programs tell you in no uncertain terms when you interrupt them (like Python scripts, portage, etc.), but then there are some that silently “go away” with no sign that anything happened (e.g. genkernel). Not only that, but why not echo the Ctrl-C? Guess what? Other variants of Unix have done it for years (e.g. FreeBSD). Why did Linux choose to forgo the ^C? I don’t know, and Googling ™ for it tells me little if not nothing.

What I do know is that I am on a quest to get back ^C, ^Z, etc. After chasing a red herring or two, I think I’m on it. I am recompiling my Linux kernel right now, and after a one-line change to “drivers/char/n_tty.c”, I have high hopes that I will have ^C back.

First post, Gentoo development, FreeBSD, etc.

Here it is! The first post to my Gentoo blog. Yay!

I’m a very new Gentoo Developer (almost 1.5 months!), and I guess the reason I became a dev is that my career has slowly taken me away from development over time. Also, I love open source, and I’ve been a Unix geek for many years. Gentoo is something that grew on me. When I first tried it, the selling point I remember most was that you could maximize the performance of the OS on your machine by tweaking and compiling everything, and I really didn’t know if all of that compile time was worth it. I left a bit puzzled, like I was missing something, but I came back to it later (I forget why), and that was it – I was hooked. It doesn’t matter that much to me that it’s all compiled optimally, and as for the “learning factor” in the manual install process, I’ve been with Linux too long for that to be of personal benefit (although I can see its value for newcomers). What I really like is the way Gentoo is put together and the thought that has gone into it. Portage is very cool – it’s a great package managing system. Emerging packages is somehow very satisfying. I love the text colors – really – very aesthetic. Little details like this are important to me.

OK, so what do I work on in Gentoo? Two main things:

Packages of interest (currently only one: app-emulation/xtrs)
Gentoo/FreeBSD

I joined Gentoo as part of the Gentoo/FreeBSD project (informally called “g/fbsd”). Portage was inspired by FreeBSD’s “ports” system, but it’s pretty cool to be able to actually emerge Gentoo packages in FreeBSD. Why do I like BSD? Well, I’ve been a Linux guy for a long time, and there a lot of reasons BSD is enticing, one of which is its Unix heritage. Don’t get me wrong: I still love and use Linux, but why not explore new (or is it old?) things?

I got my start with ebuilds helping to bump the version on “xtrs”, which is very near and dear to me (you see, I was a TRS-80 geek when I was 13 or so), and I now co-maintain it with the dev who helped me initially. In fact, that initially peaked my interested in becoming a dev. I expect I’ll pick up other packages of interest in the future, too.

OK, time to admit something: I can be quite the perfectionist, and my latest challenge (some might say obsession) is tracking down some strange behavior in the bash shell that happens in FreeBSD but not Linux (I say “FreeBSD”, because I have confirmed that it is not g/fbsd-specific). I’ve gotten the interest of a couple of FreeBSD devs, and I hope this will get fixed soon. What is the issue, you might ask? It’s pretty cosmetic, really: cursor movement is kind of wacked at the very end of a command line if you have the UTF-8 locale set. Unless you move back and forth in the command line to edit it (bash’s readline functionality), you won’t see it. But to me, it just feels “flaky”, and that’s where my perfectionism comes in. I’ll solve this, damn it! Operating systems should feel solid, reliable, consistent, etc., right? 🙂

Before I sign off, I want to thank all of the cool Gentoo developers I’ve worked with. Thanks for putting up with my “newdev” questions so far, and I hope I get to make some useful contributions, which would be a lot easier if I didn’t have a paying job too!