Migrating disk layout from mess to raid1

Imagine you are dumb guy like me, first what I did was to set up 3 1TB disks into one huge LVM copied data on it and then found out that grub2 needs more free space before the first partition to be able to load the LVM module and boot. For a while I solved this with external USB token plugged in the motherboard. But I said no more!

I bought two 3TB disks to deal with the situation, and this time I decided to do everything right and add UEFI boot instead of normal good old booting.

Disk layout

Model: ATA ST3000VX000-9YW1 (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  512MB   512MB   fat32        primary
 2      512MB   20.0GB  19.5GB               primary
 3      20.0GB  30.0GB  9999MB  xfs          primary
 4      30.0GB  3001GB  2971GB  xfs          primary

So as you can see I created 4 partitions. First is special case and it must be always created for EFI boot. Create it larger than 200 megs, up to 500, which should be enough for everyone.

The disk layout must be set up in parted as we want GPT layout (just google how to do it, it is damn easy to use), It accept both values like 1M, 1T and percetage like 4% to specify the resulting partition size.

Setting up the RAID

We just create simple nodes and plug /dev/sda2-4 and /dev/sdb2-4 to them. Prior creating the RAID make sure you have RAID support in your kernel.

for i in {2..4}; do mknod /dev/md${i} b 9 ${i}; mdadm --create /dev/md${i} --level=1 --raid-devices=2 /dev/sda${i} /dev/sdb${i}; done

After these commands are executed we have to watch mdstat until it is prepared (note that you can work with the md disks in the meantime, just the setting of the RAID will be slower as you will be writting on the named disks.

After we check the mdstat and see that all the disks are ready for play:

croot@htpc: ~ # cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md4 : active raid1 sda4[0] sdb4[1]
      2900968312 blocks super 1.2 [2/2] [UU]
      
md3 : active raid1 sda3[0] sdb3[1]
      9763768 blocks super 1.2 [2/2] [UU]
      
md2 : active raid1 sda2[0] sdb2[1]
      19030679 blocks super 1.2 [2/2] [UU]

we can proceed with data copying.

Transfering the data and setting up the system

mkfs.ext4 /dev/md2 ; mkfs.xfs /dev/md3 ; mkfs.xfs /dev/md4 # create filesystems
mkdir -p /mnt/newroot/{home,var} # create the folder struct (home and var are actually the md3 and md4 so prepare the folders for them
mount /dev/md2 /mnt/newroot
mount /dev/md3 /mnt/newroot/var
mount /dev/md4 /mnt/newroot/home

Now that we are ready we will use rsync to transfer living system and data (WARNING: shutdown everything that temper with data (like ftp/svn/git services). Only thing we are going to loose is few lines of syslog and other log services.

rsync -av /home /mnt/newroot/home # no -z as we don't need to compress
rsync -av /var /mnt/newroot/var
rsync -av / --exclude '/home' --exclude '/dev' --exclude '/lost+found' --exclude '/proc' --exclude '/sys' --exclude '/var' --exclude '/mnt' --exclude '/media' --exclude '/tmp' /mnt/newroot/ # copy all relevant stuff to newroot
mkdir -p /mnt/newroot/{dev,proc,sys,mnt,media,tmp}

After the transfer you need to edit /etc/fstab to reflect new disk layout. Update kernel (if needed to support new RAID layout) and update /etc/defaults/grub if you did RAID like me to contain domdadm line for default command.

Preparing new boot over UEFI

On your machine you need to create usb dongle which supports UEFI boot (you need to be uefi booted to setup UEFI [fcking hilarious]).

We need to download latest archboot iso 64bit (gentoo minimal didn’t contain this lovely feature).
Grab some usb disk and plug it into our machine. We will format it to 32b fat: mkfs.vfat -F32 /dev/[myusb] , mount somewhere and copy the ISO image content to the usb folder (you can enter it in mc and just F5 it if you are lazy like me, but it is working with tar, p7zip or whatever else). Shutdown the computer, unplug old disks and with manic laughter turn the machine again on.

To boot the uefi just open boot list menu and select the disk which has UEFI around its name. It will open grub2 menu where you just select first option. We should be then welcomed by lovely arch installer. Not caring about it switch to another console and open terminal. Setup again the arrays using mdadm –assemble.

for i in {2..4}; do mknod /dev/md${i} b 9 ${i}; mdadm --assemble /dev/md$i /dev/sda${i} /dev/sdb${i}; done

Then just proceed with mounting them somewhere to /mnt and chroot like you would do new gentoo install. Exact steps:

modprobe efivars # load the efi tool variables
mkdir -p /mnt/newroot/{home,var} # create the folder struct (home and var are actually the md3 and md4 so prepare the folders for them
mount /dev/md2 /mnt/newroot
mount /dev/md3 /mnt/newroot/var
mount /dev/md4 /mnt/newroot/home
mount -o rbind /dev /mnt/newroot/dev
mount -o rbind /sys /mnt/newroot/sys
mount -t proc none /mnt/newroot/proc
chroot /mnt/newroot /bin/bash
. /etc/profile
env-update

Now that we are in chroot we just install grub2 with GRUB_PLATFORMS=”efi-64″. After that we proceed easily by following wiki article.

Unmount the disk, reboot the system, unplug the flasdrive, …, profit?

Bookmark the permalink.

20 Responses to Migrating disk layout from mess to raid1

  1. dev-zero says:

    Instead of booting with an EFI-enabled CD, you could also have tried to build a monolithic kernel (probably had to include the initramfs if root is on a softraid) with the EFI-payload option enabled. Rename that kernel image properly and but it in the right subfolder of the fat32 partition and the EFI should have been able to boot the kernel directly. That’s what I was doing in a similar setup and without CD-drive.
    In principle you need the boot in EFI mode only if you want to be able to change EFI config. If you can place and name your stuff as EFI expects it, you might not even have boot into EFI the first time.

    • scarabeus says:

      Yeah that might work, I just thought the livecd to usb approach should be safer to reduce need for reboots.

      • Francesco R. says:

        no need for an initrd, use:
        CONFIG_CMDLINE=”md=0,1,,,dev0,dev1,…,devn ro root=/dev/md0 rootflags=relatime”
        in kernel .config, see Documentation/{kernel-parameters,md}.txt

  2. Francesco R. says:

    good to read, just a note:
    to rsync / (root) instead of using that long list of “–exclude” is safer to bind mount (opposed to rbind mount) the root filesystem elsewhere, this way all the submounts are forgot, example:
    mkdir /mnt/gentoo
    mount -obind / /mnt/gentoo
    rsync -av /mnt/gentoo/ /mnt/newroot # copy all relevant stuff to newroot
    man mount has the details, you could also use -obind,ro to be really on the safe side.

    • scarabeus says:

      That is a good point, I was just going for the first solution that hit my mind. For sure your is more elegant tho.

    • Orome says:

      another possibillity is to use -x rsync option:
      -x, –one-file-system don’t cross filesystem boundaries

  3. Richard Yao says:

    You could replace the entire stack with ZFS. sys-boot/grub2 supports booting off a mirrored ZFS pool, so this should work nicely should you ever decide to try it.

    • scarabeus says:

      Hehe, I agree you do a great job there, but I am still not exactly solid about the licensing there. But as you can see I ain’t insane enough to use btrfs.

    • Amanda Shu says:

      Hey Richard,

      I have always thought you were just joking. But by now, your *constant* talking about ZFS is really annoying. Whenever I see your name, the next sentence contains “ZFS” — be it on IRC, on mailing lists, or even on other people’s blogs.

      Sorry to say that, but I figure you just wouldn’t notice otherwise.

  4. Duncan says:

    > The disk layout must be set up in parted as we want GPT layout

    Why “must”? gptfdisk (aka gdisk, tho there’s now a number of other binaries including the ncurses-based cgdisk in the package as well) works great for setting up gpt, as indeed, that’s its primary purpose! =:^)

    FWIW, I use gptfdisk and gpt partions exclusively, here, including for partitioning USB drives (both thumb and external) as well as internal, main machine and netbook, and for booting with both grub-legacy (gentoo long since included that patch) and grub2 (all BIOS-based as I don’t have an EFI system yet, but they handle gpt as long as the bootloader and kernel support it as both grubs now do along with the Linux kernel given the appropriate config options, but I do reserve an EFI-system partition for that day…).

    Also, what’s with the “primary”? Along with a number of other improvements GPT partitioning does away with the primary/extended/logical distinction entirely — all partitions (up to 128 with standard size gpt partition tables) are in the same table, no primary/extended/logical at all, and saying gpt partitions are primary is rather like saying a car is pulled by a single horse, since there’s a single engine replacing the horse from the horse-drawn-carriage era! Or maybe arguing whether it should be one horse since there’s one engine, or one per cylinder… (Of course some electronic vehicles have one rather smaller motor per wheel, complicating things further…)

    FWIW, the gptfdisk (aka gdisk) home page has a reasonably clear sysadmin level (aka gentoo user level) discussion of the various technical aspects, for anyone interested. There’s even an option for a hybrid mbr/gpt dual-table setup, tho of course the mbr side is restricted to 2TiB, for those who would like to be able to boot with either mbr or gpt based systems. (Tho technically, the last logical partition in an mbr partition scheme must begin below 2 TB in ordered to be addressable, but could extend beyond that and be upto 2TiB in size bringing the total to nearly 4TiB, handling it much like 16-bit MS-DOS handled the high memory area above the 1 MB barrier back in the day. But while technically allowed by the spec and no doubt what they’d do if they had to, it’s unlikely to be compatible with existing proprietary software, and much like 64-bit compared to 32-bit but without the new hardware requirement, gpt is /much/ more flexible and long-run simpler, so that has been the preferred solution.)

    http://www.rodsbooks.com/gdisk/

    (Sorry if that came off as rude. The “must use parted” thing just got my goat, since there’s no “must” about it, that I can see anyway.)

    • scarabeus says:

      I just know parted is part of livecd, dunno if others are and since it is dependency of udisks any desktop user will have it on his system. Thats why I picked parted.

      But you are right, it can be set in anything supporting gpt.

      • Duncan says:

        I’d consider myself a desktop user, but FWIW I finally got disgusted enough with udisks pulling in stupid stuff like lvm2 without USE flag recourse on a system that doesn’t even have dm activated in the kernel config, that I got rid of it, even tho I had to kill k3b (replacing it with graveman, FWIW) to do it. (Similar story with USE=semantic-desktop, akonadi, and kmail, FWIW. Got fedup with it and now I’m using claws-mail.)

        But being on the liveCD is a valid point and very likely does justify “must” in context. I was just missing that context as when I did my conversion, I simply backed up the system to an external USB-connected disk and booted it, never touching a liveCD, which I haven’t used on my own systems in years (even when I did my original 2004 gentoo install in fact, I used the alternate install method with an existing mandrake install, to bootstrap gentoo far enough that I could boot it and complete the job, no liveCD there either, so it really HAS been awhile). The liveCD context never even crossed my mind.

        But boot the liveCD to make changes to the partition type and layout of the internal disks is probably how most folks would handle it, indeed, and in the liveCD context, parted probably /is/ the only on-disk tool that handles gpt, so “must” probably /is/ justified.

        It’s all in the context, I guess. =:^) Scoping rules and the disagreements they can trigger!

        Thanks. Much clearer now. =:^)

  5. Rich0 says:

    If your original partitions were all on lvm spread across the three old disks, why not just create one big raid and add it to the volume group? Then you can do a pvmove to migrate the data (or pvremove the old drives). You could do the whole thing online that way.

    The other thing you can do so as to not waste a bunch of 1TB drives is to do a trick I’ve used. Create 1TB partitions on the new drives, and then an additional partition with the balance of the space (~2TB – after allowing space for EFI, boot, etc). Then you create a raid5 across the 1TB partitions, and another one across the 2TB partitions. You put both in the same volume group so the space acts like a single pool. Then after you’ve removed the old physical volumes on the old 1TB drives you have 3 unused 1TB drives. You can then create partitions on those the same size as the 1TB partitions on the new drives (plan ahead obviously – don’t make the new 1TB partitions too big). Then reshape the 1TB raid5 across the 3 old drives so that there are 5 drives in total in it.

    The result of this is 6TB of usable disk space, all with n+1 redundancy. Your volume group has two raid physical volumes in it – a 2TB mirrored volume, and a 4TB raid5 ((5-1)x1TB). That’s twice as much space as you were planing on, and 1TB more than you’d get if you put the old drives in a raid5.

    From what I’ve read I don’t think it matters if you use a raid1 or raid5 for the 2TB partition. I think you can reshape a raid1 into a raid5 later, but I’m not certain of this. I’d probably stick with a 2-disk raid5 which is effectively a raid1 anyway (just with the mirrored drive being stored as parity data instead of identical data, and maybe read performance is worse if the kernel can’t figure out that the array could work like a mirror).

    I always prefer raid5 to raid1 as it is much less wasteful of space. Once you have a raid5 you’ve paid the redundancy cost, and any time you add a drive you get the full amount of space out of it. That is part of the reason I use the multiple partition approach – I want to not have to pay the parity price twice.

    • scarabeus says:

      Well I could migrate it by expanding the lvm, but my goal was to get rid of lvm and stick with simple md.

      Also I removed the 1tb disks from the setup so it is now just 2 3tb disks.

      For the raid5 you are right it is less wastefull than raid1, but it is also less reliable. I would rather go for 2 more 3tb disk and setup raid10.
      With raid5 I always fear that under some unlucky situation I would lose too much disks so the metadata would not be accessible.

      • Rich0 says:

        Interesting. If anything now that I have to use an initramfs anyway I’m looking to migrate one of my raid1s into the lvm VG (my root filesystem). LVM just offers a lot more flexibility.

        Keep in mind that with a 4 disk raid10 you get 6TB of space and the loss of two drives has a 50% chance of causing you to lose everything. With a 4 disk raid 5 you get 9TB of space and a loss of two drives has a 100% chance of causing you to lose everything. A 4 disk raid6 gives you 6TB of space also, but a loss of two drives is guaranteed to not lose any data. I’m not sure I’d really consider a raid10 a great solution – if you think double-failures are unlikely they waste space, and if you are concerned about them then they are risky.

        In any case, you should be backing up anything you really care about offsite anyway. I don’t do that with all my data (at multi TB that gets very expensive), but I do back up the stuff I consider important.

        • Duncan says:

          At least from my perspective (Scarabeus can speak for himself), there’s a number of problems with that, rich0. Speaking from experience here:

          1) Any time soft-raid and lvm are stacked, system complexity increases dramatically. Here, I found I was simply no longer confident in my ability to understand the layers and their interaction well enough to be comfortable in my ability to manage a recovery situation without “fat fingering”, even aside from the fact that additional layers mean a greater risk of operational failure due to bugs, etc. To me, the additional flexibility simply wasn’t worth the cost of the additional complexity, especially since md/raid handles partitioned-raid just fine these days.

          Dumping the lvm and building just on md/raid allowed me to sleep better, being quite confident in my ability to properly administer the system even under the additional stress of a recovery situation.

          2) For any installation where write-speed is a factor at all, parity raid (5/6/50/60) is SLOW, because unless the kernel (or hardware in the case of hardware raid) has already cached an entire data stripe, any write actually forces a read-modify-write cycle, since the data from the entire stripe must be available in ordered to calculate and write the new parity.

          In a read-only scenario that is of course not a factor, and where speed isn’t a big factor it’s not a problem, but on a normal desktop/workstation system there’s often enough write activity that it IS a factor (tho it wouldn’t be in certain server systems, where the technology was after all developed).

          Non-parity raid, raid-1/10, avoids this problem, since the parity calculation is avoided. Additionally, the kernel md/io scheduler is /surprisingly/ efficient at scheduling parallel reads, such that on a non-bus-bottlenecked raid (watch out for legacy IDE raid configured on both master/slave, or for multiple high-speed devices behind the same bus, such that it saturates), raid-1reads tend in practice to be rather faster than the single-disk read-speeds that would be predicted for a “dumb” raid-1 algorithm.

          Thus, I found a 4-disk raid-1 (sata, spinning rust so bus saturation wasn’t an issue) rather faster in practice than the same 4-disks in raid-6. (I wasn’t comfortable with single-redundancy so no raid-5.) Writes were faster since I avoided the read-modify-write cycle of parity-raid, and bus saturation wasn’t an issue so writes were full single-device speed even tho I was writing the data four times. Reads were faster since much of the time, the kernel was able to schedule reads in 4-way parallel, minimizing seek-time costs and often allowing 2X-4X single-disk speed.

          3) Done properly, md/raid doesn’t require an initr*. With grub-legacy, /boot had to be raid1, but with grub2, with gpt and either a bios or efi system partition as appropriate, grub can read both lvm and md/raid, so it’s possible to get to /boot. With md/raid 0.90 metadata, the kernel can auto-scan and mount md/raid so can read rootfs-on-md/raid just fine. With 1.x metadata, it can’t autoscan any longer, but I _think_ it can still take kernel command-line raid assembly, an alternative for 0.90 metadata as well. I’ve tested command-line raid-assembly, but only with 0.90 metadata, so I can’t say for SURE that it works with 1.x, however. (All the documentation I can find says that the kernel needs 0.90 to auto-assemble, but where there’s an explanation, it consistently gives as the reason that only 0.90 can be auto-scanned, nothing at all about no-auto-scan kernel commandline based assembly with 1.x metadata, either way. But I KNOW it works with 0.90 metadata as I ran that way for years.) That was another reason I dumped lvm, since (unlike md/raid) it requires userspace management and thus either an initr* or an out-of-lvm rootfs.

          4) It’s worth noting that for an md device re-add (in the event of a kernel crash or loss of power, bad shutdown in the middle of a write), a good base device partitioning scheme, with multiple mds on top of those partitions, saves a LOT of re-sync time. As such, several smaller md/raids in a well considered data layout split across those mds is preferred above BOTH one-big-raid mode AND any scheme (such as lvm not strictly managed to avoid it) that splits up data across those mds in a less strictly managed fashion.

          I learned from experience to keep for example, the portage and kernel trees along with ccache on one raid, not even activated (let alone the partitions mounted) when I’m not actually doing system updates. Similarly, my media raid/partition isn’t active/mounted unless I’m using it. /home is normally active/mounted, as is /usr/local, but being separate, if a bad shutdown does occur, it’s unlikely all three of my normally active/mounted raids (rootfs, home, local) are being written to at the time of the crash, so re-sync is often only necessary for one of them. As a result, resync normally takes minutes instead of hours. Combine that with write-intent-bitmaps, and the resync is often done by the time I’m fully booted, so I’m often not even actually sure whether it was just the filesystem journal replay or whether a resync actually occurred as well, to delay the boot a bit.

          While such strictly managed data layout wouldn’t entirely eliminate the additional flexibility of lvm, it does limit it, and the combined lvm limitations of (1) increased complexity and recovery doubt as a result, (2) either rootfs excluded from lvm or even /more/ complexity due to the required initr*, (3) either loss of re-add efficiency or restricted lvm flexibility due to layout efficiency considerations, increased the cost to benefit ratio so much that lvm was no longer a reasonable choice, for me.

          5) I’ve wanted to test 3-way redundant (dual redundancy) raid-10, but I’ve not been able to yet. My old hardware was more or less limited to four SATA devices, while the most straightforward way to setup 3-way-redundant raid-10 requires six devices. My new hardware doesn’t have that limitation, but budget is currently a constraining factor. Of course two-way redundant (single redundancy) raid-10 only takes 4 drives, but like you, when dealing with that many devices, I want 100% tolerance of a second device loss. With md/raid, it IS possible to setup raid-10 with less than four devices for single redundancy (2-way), less than six for dual redundancy (3-way), but doing so reduces both effective space and speed, since some devices are doubling up on data stripes, thus throwing in additional write-seek delays as they move from the one to the other. As such, that’s tilting toward the write-expense of parity-raid, tho for a different reason, so three-way-redundant (dual redundancy) raid-10 still requires six-disks for full efficiency, even if it’s possible to configure it with less.

          But raid-10 /can/ be a great solution even for those wanting/needing multiple redundancy; it just requires 6 devices and three-way redundancy to get there. Now that I have hardware with the available SATA slots, I hope to try it and be able to tell from experience what it’s like. But until the budget allows it, that does remain theory, from my personal experience perspective, anyway.

          Duncan

          • Rich0 says:

            Duncan, I will agree that raid with parity does have a write impact due to the read/write cycle on writes smaller than a stripe. Hopefully one day we’ll all be on btrfs and that problem will be behind us, but at least for my loads I don’t find much of an impact.

            Sure LVM+RAID is more complex, but I find it very unlikely that if things get hosed that I’ll be manually hex-editing filesystem metadata. Even raid1+0 is going to make that pretty painful since it is striped.

            The main benefit of LVM is greater flexibility with filesystem resizing/etc. You lose all of that once you are running directly on raid. As far as partitioned raid goes – linux has supported that for as long as I can remember – and my raids are running on partitions anyway.

            As far as not needing an initramfs goes – that’s only true if you have /usr on your root or you don’t use udev (though the reality is that the degree of udev breakage with a separate /usr is not horrible yet afaik). My root is on a raid1 that is only 1GB in size, so clearly I’m not going to be putting /usr on that. I wanted only a minimal root both to keep it from filling up and to maximize my benefit from lvm. Now that I’m running an initramfs anyway one of these day’s I’ll probably move my root over to lvm, and maybe consider merging everything back onto one filesystem again. Then again, I’m just as likely to wait before making any major changes until btrfs is production-ready and just move to that…

            * – and yes, I know zfs can do the same things… I’d prefer something GPL, and I don’t think that zfs supports raid5 re-shaping either – just adding new arrays to a zpool which isn’t as nice as mdadm’s ability to online reshape a raid5 just adding one more drive.

  6. Duncan says:

    @ rich0: Hmm, looks like we hit wordpress’ nesting limit and I don’t see a reply link under your post, but this is a reply to it, wherever it appears…

    As in other discussions, we seem to agree on more than we disagree. =:^)

    BTRFS: I believe we’ve both already experimented with that (and scarabeus mentioned it too) and have come to the conclusion that it’s not mature enough for our needs at present. Maybe sometime next year…

    ZFS: I’m with both you and scarabeus: the zfs powers that be had the choice to unabiguously dual-license as gplv2 or even gplv3, and chose not to. That unambigously takes it out of consideration for many, including me.

    Disaster recovery: Here’s where we disagree a bit. I doubt I’ll be doing any hex editing either, but I’ll admit I did have trouble keeping straight the various commands I’d need to invoke to recover from a failed or simply out-of-sync disk with both lvm2 and md/raid in the stack, especially if I was working from a relatively limited initr* or bare rootfs. lvm keeps a configurable number of backup metadata tables, yes, but I wasn’t sure could tell it where to look for them if necessary, and remember the commands necessary to do so, possibly without access to manpages or the like as they’re part of the data I’m trying to recover. Ultimately, I simply made “an executive decision” that ranked a much higher recovery confidence along with other factors, above the additional flexibility lvm would provide.

    Maybe I have a bit longer memory in regard to md/raid than you. Back when I first set it up, it did already support partitioned raid, but the feature was new enough that it was extremely hard to find much documentation on how to actually DO it (and back then it was more difficult than now, as partitioned-raid used a separate device, etc). And before that, as much of the documentation I found suggested, the going solution was to stack lvm (then lvm1 or whatever it was that was competing at the time, evms maybe?) on mdraid, using it to provide the missing partitioned-raid functionality. I guess that was kernel 2.2 and perhaps early 2.4 era, while I was setting up my first mdraid with either middle 2.4 before 2.6, or early 2.6.

    I did use lvm on mdraid when I first setup, tho I used partitioned raid as well, but not to the degree I did after I killed lvm. But as I said, I was never really comfortable that my disaster recover skills were upto the task, and it was in fact after a bit of inconvenence trying to recover from a bad openrc update (likely baselayout-1.12 or 1.13 at that point), where the binpkg backup for the previous version was found on the lvm I couldn’t run due to the bad openrc update, that I decided I REALLY had to do something different to ensure that didn’t happen again. So I did, and it didn’t/couldn’t happen again. =:^)

    It’s worth noting that with mdraid directly configurable on the kernel commandline, if IT fails at that level, I can simply boot the old kernel and/or give it a slightly different commandline and am back in business. No “requires a userspace revert that I can’t get to the binpkg or build tree in ordered to do, because they’re on the volume I can’t access with the flubbed update”, about it! =:^)

    And at least from my perspective, the whole limited rootfs more or less directly equates to the whole limited initr* issue. What good are tools you’ve ensured are present along with their necessary dependencies, when you can’t get to the documentation you need to use them, because that documentation is on either a separate /usr or a real-root that’s not mounted and not possible to mount without access to the doucmentation needed to figure out what commands are necessary to mount them again? Again this is based on hard-gotten personal experience.

    You do have a point about a 1GB rootfs, but I can counter that with my real configuration that isn’t all /that/ much bigger. A rootfs doesn’t have to be TOO much bigger than a gig to contain ALL packages/files portage installs, along with its package database in /var/db/pkg.

    $ df /
    Filesystem Size Used Avail Use% Mounted on
    /dev/root 8.0G 2.8G 5.3G 35% /

    2.8 gigs used, including all installed packages and the package database at /var/db/pkg. While I’m currently running an 8 gig root (setup when I was going to try btrfs and didn’t know how much room to reserve for its double metadata, turns out it actually used LESS space, with compression turned on), but for years I ran a 4 gig root, including during the period when I had both kde3 and kde4 installed, and 4 gig was more than sufficient, tho 3 gig would have been trouble. (IIRC the most I ever saw used was 3.5 gig or so.) It IS worth mentioning that I do use reiserfs with tail packing enabled. For the more commen ext2/3/4 case without tail-packing, usage would be a bit higher, maybe 3.2 gigs. But 4-8 gigs is definitely a reasonable range even for that and with more packages installed than I have.

    So it doesn’t take /too/ much more than your gig of rootfs, to hold EVERYTHING portage tracks, including the package database itself. 4-8 gigs is a very reasonable estimate.

    I came by the everything-portage-installs-along-with-its-database rootfs policy by hard experience. A failed AC and resulting disk head-crash left me running a backup (bare) rootfs of one age, a backup /usr of a different age, and a backup /var with its /var/db/pkg database tracking installed packages that matched NEITHER of the above! THAT was a mess to resolve! After that experience, I resolved to keep everything portage installed along with its database together on the same rootfs, so they COULDN’T be out of sync. My backup rootfs might be out of date, but its portage database would at least match what was actually installed!

    Another major benefit of such a policy is that all the tools I use along with all their manpages, complete with working X, browser, the entire installed system (even video players and games!), are on the backup in exactly the same configuration as I was using them at the time of the backup. No need to worry about access to manpages and tools I can’t reach, because they’re not on the limited rootfs or initr* that’s all I can boot, because the entire installed /system/ is available from rootfs. If the primary/working rootfs fails I simply boot the backup and have the same fully operational system I had at the time I made the backup. And if the first backup fails along with the working system, maybe because the failure occurred just as I cleared the backup in ordered to make a new one, I simply boot the second backup, same size as the first, same size as the working copy, with a complete copy of the installed and configured system just as it was when I took THAT backup. And all it takes is a bootloader kernel commandline option to switch between working and backup rootfs, everything else needed is in the (monolithic) kernel itself, and if a new kernel fails, I simply boot the old one that was known to work.

    So when I speak of disaster recovery, I know of what I speak based on experience! Both my switch to raid and my policy of keeping everything that portage installs together with its installation database on the same partition, were in response to that disaster and what I learned from my subsequent recovery from it.

    And when I said I couldn’t be confident in my ability to recover from a disaster with stacked lvm and md, that’s really the disaster I had in mind. I do keep an external USB copy (or currently with my new system, a disk from the old system and a card I can plug in to run it on the new system, as I’ve not gotten around to updating the external backup beyond that) around as well, to deal with full internal storage failure (say one of the pre-release kernels I test starts scribbling over all acccessible storage irrrespective of partition barriers, etc), but really, it’s pretty hard to get easier to recover than that.

    Once setup, the utter simplicity of such a setup means that both backup and recovery are simple as well. Recovery has already been described, and rootfs backups are a matter of doing a mount-bind of the rootfs, a mount of the freshly mkfs-ed backup partition, and a simple cp -ax of everything on the resulting bind-mount to the freshly empty backup partition of exactly the same size. With less than 3 gigs of data to backup, it doesn’t take forever, but the backup is as boot-to-it functional as the original. I keep multiple versions of fstab, with the backup version flipping rootfs and the backup, and fstab itself as a symlink that points to one or the other. So after the cp to backup completes, I flip that symlink on the backup, so booting it and saying to mount the backup mounts what would normally be the working rootfs as the backup, and I do reboot and actually test the new backup as a backup isn’t complete until it’s tested (I don’t want the experience of finding THAT out the hard way!), but that’s all there is to a system backup, making it far simpler and yet far more functional than the complicated backup systems so often used.

    See, no hex editing involved. Just appropriate kernel parameters to point at a backup rootfs if the normal rootfs copy won’t boot, and then use the fully functional as installed and configured normal system to recover anything else as necessary, No wonder I can be confident of my ability to recover! =:^)

  7. Alex Buell says:

    If you need to dispose of the old 1TB disks, can I buy them?!?! :-)

    • scarabeus says:

      Too late, coworker already confiscated all of them :P

      Also they are not much of win, they ran 24/7 for more than year…