November 2013 – Luca Barbato

The Pink Pony

That’s what happens LOADS of time while discussing requirements.

I want a Pink Pony!

Ok, what that pony is supposed to do? Does it walk? Does it run? Which kind of user would ride it?

Oh well, not sure, a pony is cool and cute that’s what important! Ask Marketing for the details.

According to the information from marketing and design the average use would be a big adult, carrying swords, pikes and shield called Knight. You need a Steed not a pony.

But but, PINK, my daughter loves pink!

According to the current poll among the Knights the top colour is pitch black. Your target user wants a Black Steed.

(A longer version will appear sooner or later)

Some ideas regarding / and /usr

What is the / ?

On unix-like system we have the wonderful abstration that let us keep on a single hierarchy all the file systems, it is called mount: we take the logic tree contained in a filesystem and we graft it over another. The initial file system holding all those grafts is the root file system and it lives on the mount point /. Let’s call it rootfs, it usually contains everything needed to start the system.

What (the hell) is /usr ?

Back in the old times where space was constrained the rootfs (/) started to get too big, so, as quick workaround some stuff deemed not fundamental moved on a separate fs mounted as /usr. The quick workaround proved itself useful so started to remain permanent with few exceptions (Hi Hurd, had you been rewritten from scratch this month already again?).

Split-/usr mount, what is the deal?

Linux had since long a quite useful volume-manager, it let you do a number of increasingly complex transformation over storage remaining nearly-agnostic regarding the file system. Being able to extend your storage by adding disks and merging them in a single logical storage seems useful, so is having software-mediated RAID setups.

Gentoo had since long a tutorial how to split all the rootfs mountpoints in different volumes. The idea makes sense or not depending on your tastes. Many people liked it and used it.

The rootfs contained a statically linked set of binaries useful to mount /usr and that was all of it. We can call this set of tools nowadays early-boot tools and the partition holding them early-bootfs. The two usually are the same.

Once /usr is mounted the boot process can start anything else.

Your problem now is figure out what is needed to boot /usr:

A volume can be a logical one created by a volume-manager
The file system could be a module and the module could be compressed by something the kernel doesn’t understand by itself
The file system can be userspace (thanks to fuse) and it could be implemented by an interpreted language such as python
Volume and filesystem can be networked, thus you need to bring your network up and the network could require additional components.
If you are into crypto, again it could be something mediated by the userspace at volume and file system level.

The possibilities are many and if you want to support them all no matter how unlikely it looks a complex problem.

Everything is broken let’s break it some more!

Some loud mouthed people decided to go against one of the key design item in writing a component such as an init system, that is keeping the single point of failure as simple as possible to reduce the chance it fails.

They kept adding compulsory dependencies to it that use to live on /usr. I think that as a way to get a problem so his arguably half baked solutions can be sell as only future and to make the most annoying situation in the previous paragraph the default.

The rational reaction would had been to just tell them to keep playing with their broken toys in a corner.

Initramfs

So we have this problem, a arbitrary long list of components that would make the rootfs large and some genius actively tries to have the first program started require most of it and shovel the concept down everybody else.

One solution would be just merge rootfs, early-init-bootfs and the whole /usr together somehow, welcome back to the the early 1900! (incidentally you will need also /var mounted but that’s a digression)

Obviously the problems causing the initial split-/usr are still there.

Linux has a neat feature called initramfs, the successor of initrd, it is great to keep modules and all the stuff you might need to mount your rootfs in a place the kernel could always reach no matter what.

So a solution would be keeping all that’s needed to mount the rootfs-now-merged-with-/usr in the initramfs since by definition is always reachable.

It is not exactly the most elegant solution but arguably works as long you get the list of required component right.

The elephant in the /boot

Some rhetoric questions:

“The initramfs is somehow related to its kernel, what happens if you keep more than 1 kernel around?”
“Which is the sane size limit for it?”
“Initramfs can get stale easily, how much time takes to create it and keep it up to date?”

The answers might vary. The short is that you need good tools and lots of space.

Alternatives

You need good tools and good knowledge about what you need for your early boot, you have to put it somewhere and keep it up to date easily. Possibly it shouldn’t depend on your kernel yet be easy to access it.

/boot as early-boot partition

That is one of the simpler ideas, we just keep a separate copy of what is needed /boot, historically most concerned people kept a recovery there so makes sense for them use it as early-boot.

Static and restrict rootfs

If you know what you are doing as long you can keep in your rootfs your tools by linking them statically (so the whole deal about compressed modules is taken care of) and you aren’t using strange stuff (so just lvm and normal fs), you do not care about this whole deal. AS LONG YOUR DISTRIBUTION DOESN’T PLAY GAMES. Nor you drink the kool-aid and use stuff that breaks by design static linking or makes as hard as possible keeping a minimal amount of stuff in the rootfs.

Summary

We always need your help and feedback to make so Gentoo keeps giving you good options and currently working systems keep working in the next future. Thanks for reading.

Early boot fun

Just few notes spurted from a discussion with a friend regarding why he feels we suck badly.

Early boot

Let’s make a quite rough description on how booting could work:

Imagine you are the kernel, you just found your rootfs, managed to run your init and you are happy. That’s probably the earliest we care.
Init got called and starts running some scripts, maybe checking the rootfs consistency before remounting r/w, maybe checking the other essential mount points before mounting them or maybe start the device manager first and then checking what is going to mount, assuming what is essential still requires some modules loaded and that the device manager will do.
Move further and set up the network
Maybe now mount the mount points that require the network (nfs?)
Now get the other daemons up and running, maybe in parallel, maybe bring up some graphical login.

Now let’s see who are the actors: rootfs, init, device-manager and maybe incidentally volume-manager and networking.

Ideally your rootfs should contain

anything required by init to run, easy, init should be as small as possible to make sure this single point of failure really hard to fail.
anything required by the device-manager to load modules, should be a no brainer, well, maybe if you want your modules compressed with some new or exotic compressor because it is “faster” that way, you have to fit it in the rootfs.
If your essential mountpoints require a volume-manager the same applies, lvm can require something weird depending on the setup so either you link it statically or you have to put it again where it is reachable, same could be said for any kind of advanced crypto at volume leve.
We discussed about mount and again we could have fuse-fs using a scripting language or other stuff that make issuing mount a little more complex that we would expect (and again fs-level crypto happening in the userspace)
The network would just need some modules loaded right? Wrong, it might need some special daemons doing any kind of bizantine authentication, and if you are really looking for pain you could be willing to netmount those mentioned file systems or even do volume management over bizantine network (ok there is a limit in this kind of perversion and we are just halfway).
Once everything is mounted the rest of the system can be brought up w/out much qualms.

So in the end your rootfs can be quite fat contain full copies of python so you can mount that funny file system, have lots of lovely brittle deps because you thought NetworkManager is the only way to get the network up and meanwhile that having some important stuff (e.g. /var) netmounted is all the rage.

Fun (aka pain)

So in short your rootfs could be as big as a compact live distribution and have as many moving parts as one (or more), well it could be just your distribution if you do not keep everything in a separated mount point.

Some years ago that was one of the suggested ways, you keep essential stuff in / and then every other root mountpoint would have its partition, maybe using some advanced stuff just because.

Then you get told that the right place in which you have to fit all discussed above has to be something called initramfs and obviously tell the kernel about it.

Probably nobody would be that crazy to end up with the far corner case, so the initramfs would have to copy just few (20+??) libraries and some (30+?!) binaries in the normal case and you have to keep it synced up properly (joy).

Most people could live happy with just a statically linked lvm and udev living in a small partition easy to mount and that would be the start and the end for them, but certain wise guys will tell you that static linking is harmful, the whole concept is broken anyway since our bluetooth subsystem requires lots of userspace and then you’d be w/out a keyboard in case something goes wrong (so you should shove bluez and happy deps in your initramfs/rootfs?).

Summing up

There are easy, simple and working solution for just some realistic scenarios, but not covering everything that’s possible.

There are more complex, brittle and error prone ones that might cover everything and more (and maybe still fail in some basic situations).

The fact lots of lemmings flocks over the complex/brittle because the guy with the largest mouth is the best speaker is sad, overly sad.

That said if you were happily using since 10 years ago a lvm setup as described by our guides of the time and now you are afraid that your next userspace update your system will break horribly if you don’t go through the hoops of making an initramfs, that won’t work for you out of box and will force you to modify your bootloader or do some other time consuming work:

I’m sorry.

(Luckily somebody prepared a portage hook to prevent some breakages https://gist.github.com/mansr/7289969 not all of them but the most glaring are covered)

Incidentally, you can still pester us, help us getting better programs (e.g. contribute to eudev, kmod, lvm and everything else you use) and take an active part in the community and hopefully protect your simple and working solutions.