Linux needs ZFS – and badly!

[before you flame me, I know that Linux (Gentoo, in fact) has zfs-fuze, but it is still pretty experimental, and it runs in user space, which makes it noticeably slower]

ZFS is Sun’s very cool filesystem. I won’t go into detail here – just google it – but it has some eye-opening features, the most critical of which is end-to-end data integrity. Unfortunately, ZFS’s license is incompatible with the GPL.

I say “critical” because I have a strong feeling that silent data corruption is far more prevalent than most people believe. Also, I just don’t buy the argument that bit-for-bit reliability is only important for servers. Yes, in certain circumstances, a bit flip here or there may not be noticed, but I think that is scary as hell. Personally, I’d rather know; I count on computers to copy the bits exactly, don’t you? We simply cannot tolerate random bit errors, no matter how “unnoticeable”. And you will notice if that bit flip hits a critical part of your file.

With disk drives becoming larger and larger and the marketing departments of drive manufacturers knowing that the general public doesn’t understand these issues, they tend to boast speed and size over reliability. We will soon be in real trouble. For an upcoming space mission I’ll be working on at my job, we may have to buy petabytes of storage. With this much, the current hard drive uncorrectable error rates will cause multiple errors per day, letting the data potentially bit rot with current modern filesystems. And just as bad, swap space is also susceptible. So even if you have ECC memory (and I recommend it highly), if your data ends up in swap, you are vulnerable.

In my experience with computers, I have caught two examples of silent data corruption. These are ones I actually discovered. It freaks me out to think there may be many more that went unnoticed. And both were due to bad IDE cables (so even the hard disk error rates don’t count here) on two different computer systems. The first on the old and slow PATA and was some data pattern dependent copy glitch, where a diff found the problem. The other was this past year on a modern UDMA/80-conductor cable, and it was found by ZFS – it appears that during some reported DMA errors (probably the cable’s fault), a 64K file block got written to the wrong spot on the disk (PATA does not protect the data address part of the communication).

ZFS is the only filesystem that actually will catch silent corruption in the whole chain: ATA interface -> cable -> disk (HW and firmware). For those who say, “Why not RAID?”, well, RAID will save you if a whole drive fails, but not these more insidious issues. I bet Linus and others are seriously thinking about what to do, since what once was considered rare could become commonplace. There are rumors Apple will adopt ZFS, and FreeBSD already has it in its kernel (and, of course, Solaris has it). For now, zfs-fuse is very interesting, but I think we need such protection of our data in the kernel, and soon.

Soon there’ll be no more pining for ^C!

For those of you who read my previous blog entry on this subject, you’ll be happy to know that you won’t have to wait much longer for a better Linux terminal experience. That’s right, my patches (sorry, more than a one-line change) to the Linux kernel were merged today by Linus, bringing “Real ctrl-C echo” ™ to a console (or xterm!) near you.

Now, when you interrupt a program, you’ll be able to revel in that confirmation that YOU interrupted the process:

^C

Yay! And it doesn’t stop there, you’ll see ^Z too (or whatever ctrl character you may have assigned to “susp” in stty).

Beyond that, “stty ixany” was basically broken, eating the character that resumed the output as well as not honoring ^C and the unusual ^\ (“quit”). You’ll soon enjoy a more compliant (i.e. in-line with other Unixes) behavior here too.

I’m not sure which kernel version will bring these new and exciting capabilities, but rest assured, you will see these revolutionary enhancements soon.