Linux needs ZFS – and badly!

[before you flame me, I know that Linux (Gentoo, in fact) has zfs-fuze, but it is still pretty experimental, and it runs in user space, which makes it noticeably slower]

ZFS is Sun’s very cool filesystem. I won’t go into detail here – just google it – but it has some eye-opening features, the most critical of which is end-to-end data integrity. Unfortunately, ZFS’s license is incompatible with the GPL.

I say “critical” because I have a strong feeling that silent data corruption is far more prevalent than most people believe. Also, I just don’t buy the argument that bit-for-bit reliability is only important for servers. Yes, in certain circumstances, a bit flip here or there may not be noticed, but I think that is scary as hell. Personally, I’d rather know; I count on computers to copy the bits exactly, don’t you? We simply cannot tolerate random bit errors, no matter how “unnoticeable”. And you will notice if that bit flip hits a critical part of your file.

With disk drives becoming larger and larger and the marketing departments of drive manufacturers knowing that the general public doesn’t understand these issues, they tend to boast speed and size over reliability. We will soon be in real trouble. For an upcoming space mission I’ll be working on at my job, we may have to buy petabytes of storage. With this much, the current hard drive uncorrectable error rates will cause multiple errors per day, letting the data potentially bit rot with current modern filesystems. And just as bad, swap space is also susceptible. So even if you have ECC memory (and I recommend it highly), if your data ends up in swap, you are vulnerable.

In my experience with computers, I have caught two examples of silent data corruption. These are ones I actually discovered. It freaks me out to think there may be many more that went unnoticed. And both were due to bad IDE cables (so even the hard disk error rates don’t count here) on two different computer systems. The first on the old and slow PATA and was some data pattern dependent copy glitch, where a diff found the problem. The other was this past year on a modern UDMA/80-conductor cable, and it was found by ZFS – it appears that during some reported DMA errors (probably the cable’s fault), a 64K file block got written to the wrong spot on the disk (PATA does not protect the data address part of the communication).

ZFS is the only filesystem that actually will catch silent corruption in the whole chain: ATA interface -> cable -> disk (HW and firmware). For those who say, “Why not RAID?”, well, RAID will save you if a whole drive fails, but not these more insidious issues. I bet Linus and others are seriously thinking about what to do, since what once was considered rare could become commonplace. There are rumors Apple will adopt ZFS, and FreeBSD already has it in its kernel (and, of course, Solaris has it). For now, zfs-fuse is very interesting, but I think we need such protection of our data in the kernel, and soon.

14 thoughts on “Linux needs ZFS – and badly!”

  1. Due to FreeBSD including ZFS, you can have it on Gentoo today! Just install Gentoo/FreeBSD.

  2. Yep, silent data corruption scares me a LOT too. I have a fairly large (2TB) setup, running software raid5.

    There is some sort of checking similar to ZFS’s scrub built in. You do echo “check” > /sys/block/md0/md/sync_action to trigger it for md0. What it does is go through and verify the parity against the actual bits on disk. If it finds a problem, it tries to write the correct data back to the disk that had an error.

    So, there is a somewhat similar thing on Linux, but I’m waiting for some really reliable filesystems.

    Unfortunately, one of the disks in my array has a problem, and silently corrupts a few megabyte of data per week. The scrub corrects it (for now) but it’s left me uneasy for the past few weeks.

  3. The problem with btrfs is that it will take at least 2-3 years before it gets anywhere near production … thats a bit too late for most people depending on computers storing the data safe. Actually, to me as a physicist, the failure probability given by the manufacturers seems to almost ‘defy’ the laws of nature:-) While not being an expert in this field, from what I read a few years ago, it seems to me that some of the technology just works somehow(TM) which doesnt give me a great sense of security …

    As mentioned, the discs arent the only issue. Actually, in the past few years, we had more problems of esp. silent data corruption caused by faulty cables and scsi cards (not only hw wise but also due to bad firmware).

    Currently we solve the issue with userspace scripts and utilities (naturally applies only for situations where performance is not needed). We dont even rely on hw/sw raid due to bad experiences but distribute copies of files over dfferent discs from different vendors via rsync and perform regular and on write hashing against a database.

  4. Why should data integrity be handled on the file system level? Does the operating system’s memory manager handle data integrity of RAM? I thought that that was handled entirely in hardware and see no reason why permanent storage should be different.

  5. Apart from filesystembitrot, how do you actually verify that your common-off-the-shelf platform actually DOES something with that expensive ecc-memory if it supports it all? I’m not talking serverhardware in here, but that stuff you’ll get at consumerlevel. Desktops and Laptops and the like.

  6. @Erik: It’s important because error detection on lower levels just doesn’t do the job. Check out page 9 of Eric Kustarz’ ZFS presentation (“ZFS – The Last Word in File Systems”) about what else can cause data corruption in addition the simple media errors that HW protection can cover.

    Disks are a lot less reliable than RAM. However, don’t think that you’re right in thinking RAM is never protected by software means: just search for “Software detection mechanisms providing full coverage against single bit-flip faults” for an example.

  7. FWIW: it seems there are at least two ZFS’s. The other one (strictly, zFS, “z/OS Distributed File Service zSeries File System”) is a file system for IBM System z9 and z10 mainframes’ z/OS UNIX System Services, and is an alternative for the earlier HFS.

  8. One of the reasons why PATA was replaced by SATA !

    What about encrypted filesystems ? Would decryption process provide data integrity checks as a side bonus ?

  9. Agreed, ZFS now more than ever. Especially after reading the above “why” comments.

    It doesn’t take long infront of OpenSolaris to see how sexy ZFS. FUSE just doesn’t work at the moment as an alternative, but its an amazing effort.

Comments are closed.