| « R700, KMS, 3D, SSD, and other hardware | SSDs and filesystems » |
SSDs and filesystems, part 2
So, a couple days in, and I'm still trying to (re)install Gentoo. More on that in a bit. First, let's talk about speed.
It's hard to tell whether or not my new SSDs are really a speedy improvement over the old software RAID1 array of magnetic HDDs. Normally, a bare-bones commandline system feels much faster than an aging graphical desktop, even on the same hardware.
I notice that compile times are slightly faster, though I've also been using tmpfs for Portage and the usual tmp file locations, so putting it all on RAM will lead to a significant speedup anyway.
Boot times are indeed quite zippy; the longest wait is for my media HDD to finish mounting -- it's on ReiserFS, which is known to have very slow mounts.
Now, let's talk filesystems.
The critical showstopper that's made me reinstall two times (and counting) is ext4. So far, ext4 has completely corrupted a whole drive (/var and /usr/portage) and made the other drive (/ and /boot) almost unbootable.
ext4 has eaten my data, hosed my system, and ruined my life.
No amount of fscking has fixed /var and /usr/portage, both on the second SSD. Did you know that you shouldn't let fsck try to resize broken inodes? apparently the resize behavior is known to be broken in the latest versions. It's known to corrupt filesystems. I didn't know that, either. I'm sorry, but what part of "production-ready" applies to ext4? Yeah, it's a new kid on the block, but it's moved out of the "experimental" status into the kernel.
That does not make it ready for your system. The first and second Gentoo installs largely didn't work because (I think) there might have been an invalid mount option. Or something could not be found. Or a superblock was missing. Or the moon was wrong. #$#^#&@ shitty unintelligible error messages. (Here's a tip, developers: don't put every possible thing that could have gone wrong into an error message, then repeat that message for every different error.)
My mount options seemed to be good after double-checking the manpage and around the internet, including kernel.org. Here was my original fstab, from when I had only one partition for / (no separate /boot):
/dev/sda1 / ext4 noatime,data=writeback,commit=60,nobarrier 0 1 /dev/sdb1 /var ext4 noatime,data=writeback,commit=60,nobarrier 0 1 /dev/sdb2 /usr/portage ext4 noatime,data=writeback,commit=60,nobarrier 0 1
Livin' on the edge here. I figured I wouldn't need a separate /boot partition on my first drive, so I lumped it all into one. I did that back in 2005 and 2006 with no problems, right? Right. The rest of the options were designed to maximize SSD performance.
Unfortunately, I couldn't get the system to boot. Made it past grub, the kernel loaded, but when it came time to mount /, it couldn't mount the filesystem rw. No amount of changing options worked -- adding rw to grub.conf, to the fstab options, nothing.
So I figured it must be my one-partition setup, and wiped my disks. Reinstalled again, this time adding a /boot partition on sda. Same ext4 options for /boot as for the other partitions. Rebooted and . . . nope, same errors. Now I'm also seeing a message about a possible bad option or other variable, which I can only assume was in fstab, thanks to the aforementioned shitty nonspecific error messages.
Hit up Google. Not much help. I again backed off on some of the ext4 options, tried playing with Grub parameters, but got the same results. The filesystems mostly weren't mounting, and when a few of them did, it was all readonly.
Sigh. Time to reinstall again. Set up a similar fstab, but this time I changed an ext4 option for /boot to data=ordered, based on this blog post. Reboot and . . . hey, it works. /boot gets mounted. Nothing else does, but it's a start.
I quickly booted back into the LiveCD, changed the other fstab entries to data=ordered, and reboot again. This time, the system seems to boot just fine . . . until it tries to mount /var and /usr/portage from the second SSD. *bzzt*, these cannot be mounted! Something's gone wrong. One more reboot, just for luck, then . . . *bzzt*, now there are filesystem errors! Fsck wants to fix them, so I let it run. Except it completely hoses both partitions. They seem to be so badly scrambled that even running mkfs.ext4 on them from the liveCD results in errors, some of which seem to be emitted from the libata system, which makes me wonder if now the SSD itself has also been corrupted.
I'll have to completely reinitialize and repartition that disk, now. Thanks, ext4. Thanks for hosing my data. Up yours, ext4.
I'm done trying to figure out why ext4 doesn't work. I don't care that it's supposed to be a fast file system for SSDs. I don't care that it's 40 times faster than ReiserFS to mount at bootup. I don't care. ext4 has lost my data three times now. I think my fingers are sufficiently burned to know that "the oven is hot; don't touch."
Up yours, ext4. I'm going back to ReiserFS. At least it works. It's never failed me in more than four years.
Update: On top of the initial ext4 errors, fsck problems, and mount issues, the Mobi drive was also going bad. Now the motherboard BIOS can't see it, regardless of which SATA port or cable I plug in. So just a day or so after trying out the device, when it was initially working for the first install (though the filesystem was throwing ext4 errors, at least /var and /usr/portage worked okay), and it finally finished failing. F***. I contacted the seller to request an RMA; I have a feeling that I'll end up having to go through the manufacturer, which will take a long time. Meanwhile, I'm without a workstation for an indefinite time, so I've set my devaway on dev.gentoo.org. I did find a couple other reports on the internets that say that their Mobis also died shortly after they arrived, so maybe there was a batch of bad drives.
But don't get me wrong, the Mobi drive dying doesn't absolve ext4 of any guilt. The ext4 filesystem still completely f**ked itself repeatedly on the system drive, the UltraDrive ME. It still refuses to do what it's told to do. But rather than continue to investigate related LaunchPad bugs on mounting ext4 rw and fsck errors, I'm going to move back to ReiserFS for the UltraDrive, and just live with longer boots. The RMA process will take awhile, so I may have to reinstall everything on a single drive and just avoid syncing Portage for awhile.
On a good note, OCZ (the company that makes the Vertex, an identical drive with an Indilinx controller), has been experimenting with a homegrown beta firmware that lets the drive do online garbage collection in the background. This is important for keeping the performance of the drive as fresh as when it was first used, even after it gets filled up with files and repeated (re)writes. The firmware is still in testing, but I'm hopeful that it'll make it out the door soon. Hopefully the same firmware features will find their way to my Super Talent drive -- and hopefully the TRIM command will also be implemented in the firmware.
Of course, the only Linux filesystem I know of that supports TRIM is ext4 . . .
Trackback address for this post
Trackback URL (right click and copy shortcut/link location)
16 comments
/dev/stewie/portage /usr/portage ext4 noatime 0 2
Using an lvm hence the weird device point but its been working successfully for a few months now.
/me waiting for somebody to scream
"hell no, dont use it, you'll need one no wait two ups or you'll lose your data"
Mr. Tso had quite a nice writeup on his blog (http://thunk.org/tytso/blog/category/computers/ssd/) on how to get the drive aligned correctly on Linux.
'Cause supposedly, those are the best options for SSDs to reduce writes and take full advantage of the SSD's capabilities.
@Marens:
Heh, I knew an XFS fan would come out of the woodwork at some point. No thanks; I've posted elsewhere on why it's a bad idea, and there's a good amount of anecdotal evidence and benchmarks that show why XFS is overall not as good as ext3/ext4, both for performance and writes on SSD drives. I don't work with GB-sized files, just small text documents and audio files. Nothing over a few MB in size.
@everyone else:
Thanks for the suggestions on libata troubleshooting -- the thing is, like I said, I'm not even sure it's libata that was generating some of the weirder error messages. It wasn't something really obvious like "I/O error on /dev/sdb", but more of a dmesg-style output. I can't even recall what it was, but it wasn't the usual "bad mount" error.
I'll be reinstalling yet again, and this time I'll just use ReiserFS for everything. If the errors return after repartitioning and mkfsing, then I'll know it's a hardware issue. I've done my best to ensure that the SATA cables are good, the drive's not overheating, and it's getting steady voltage. ReiserFS may have slow mounts and journal replays, thus bogging down the bootup, but I can live with that if it means I can have a working Gentoo box.
I remember reading about problems mounting ext filesystems with non-default journal settings. I think the problem was the journal couldn't be changed on the fly (during the remount to rw). These posts explain solutions; unfortunately, I can't find the post that explained the problem more intelligently than I am:
Talks about ext3, but I think the principle is the same:
http://forums.gentoo.org/viewtopic-t-786932-highlight-writeback.html
Look to the bottom of this thread (#6):
http://ubuntuforums.org/showthread.php?t=1118681
Look at entry #22; notice the rootflags entry in grub telling the system early what journal to use.
http://bbs.archlinux.org/viewtopic.php?id=62524
It certainly doesn't explain the non-root drives failing, or, of course, the big problem of data loss. (That may have been the bad drive though.) FWIW, I still use ext3...
Thanks for the info; it's good reading anyway.
I had already edited mke2fs.conf before applying the filesystem to the SSDs, but I didn't know about the rootflags option for Grub. Maybe it would have made a difference, maybe not. :)
I ended up getting a working install with reiserfs and the "tail,noatime" options for / -- I would normally use "notail", but since space is at a premium on this disk, packing stuff in is worth the extra CPU time. The media HDD, which is also on reiserfs, uses "notail" since it's much larger.
Subjectively, so far, there's not a huge difference between the old HDDs and the new SSD. Startup times are slightly decreased, but mostly I'm waiting for the long "waiting for uevents to be processed by udev" message, or whatever it is. And yes, checking the ReiserFS journal takes awhile when the media HDD is mounted. Mount times aren't noticeable on the SSD.
Application launching is faster, though Firefox and Thunderbird are still almost as slow as ever. Other apps laod instantly, even when launching several all at once. It's also faster to start X, Slim, and login to Xfce.
I'll post more thoughts and hopefully some benchmarks soon-ish.
man mount says:
"To use modes other than ordered on the root file system, pass the mode to the kernel as boot parameter, e.g. rootflags=data=journal."
So, yeah, that explains why you get the "unknown option" error.
I'm a happy owner of 30GB SuperTalent MX drive for over a year.
The frustration i went through during the first month made me to install Vista on it just to make sure it's working and it is possible to use it as system drive.
Any typical installation failed with mount/FS errors. I tried several FSs with probably any sane option combination.
Drive slashed into 3 parts /boot / and C: (gaming).
Ended up with installing Sabayon and then removing all but /boot and modules.
Then I was able to boot into my old Gentoo system. Just copied from old drive.
I used to ReiserFS, but tried to find something better.
From this and previous post i'll definitely use scheduler idea and tempfs for tmp/portage. Anyway I can not find use for 8GB (no swap still leaves to much available)
Why Firefox/Thunderbird do not do this themselves, I have no idea.
Hey, long time no see! :)
I didn't know it was even possible to clean up Mozilla performance just by using sqlite to do the dirty work.
On a related note, I did find a couple of different methods on the Gentoo Forums for improving FF/TB performance regardless of hard disk type . . . by running 'em from RAM. There are a couple of startup scripts that put the necessary config directories into a RAMdisk and periodically backing them up to the HDD, and flushing everything to disk when the app is closed. Supposedly it dramatically improves the sqlite backend performance, reducing hitches when loading multiple tabs, for example.
That means in Linux we've got few options: ext2, ext4 _without_ journal and ... I don't know anything else :)
I had ue 2.3 installed on a corsair x128.
started good for about 4 times,the 5th time errors when booting.restarted computer...corsair completely dead.