A five commandments for XML format designers

If you’re designing an XML-based data format, then I beg you, please read the few following rules and obey them. XML may look easy, and even is easy but that doesn’t mean that writing a good one is. And if you’re going to invent second HTML, then please, just use JSON or any other random container. That will be easier for you, and easier for us.

1. Thou shalt always write a schema

Every XML format should be well described. And no, your ten-stanza poem is not enough. Complete, dedicated Wiki neither. These usually describe nicely (or less nicely) how to write your XML. That could be great if that’s all you’re interested in. But if that’s supposed to be some public format, there is one more important thing…

It’s called reading. Or parsing. Or just transforming. If you need to handle random XML files, coming from various sources, written by random people, you have to know what you can expect and what can you assume. It’s not enough to say what <x/> does — I need to know where it can appear and what I can find inside.

There are already well-deployed XML description formats such as DTD, Relax-NG or XML Schema. Please use one of them, I will be grateful. Not only they describe the format strictly and accurately but they also provide a very simple means to validate XML files. It’s helpful both to us, who parse it, and to people who actually write such XML.

An XML without spec is an XML where every element can appear anywhere in the document. In other words, it’s not even XML but an ugly tag soup.

2. Thy XML shalt be structured, not flat

XML provides means to create neat, hierarchical structures. Use them. If your documents consists of logical parts like sections or chapters, put their complete content in a single <section/> or <chapter>, or any other thing that may come into your head. That’s the correct way of doing that in XML.

Random headings and separators are not enough. Even if your spec says they always and definitely start a new section, that’s not enough. If you don’t believe us, try splitting that thing into parts yourself. Especially when you have sub-headings, sub-sub-headings and so on.

A flat-structured XML is no real XML. It’s just a text file with a few unnecessary elements.

3. Thou shalt split text into blocks using XML, not text delimeters

Even if you think that’ll make writing much easier, do not ever try to use simple character delimiters to split text into blocks. If you need a list, create a list of XML elements. Like the following:

<l>elem1</l>
<l>elem2</l>
<l>elem3</l>

And yes, I know elem1,elem2,elem3 is shorter and easier to type. But guess what — it’s hell to parse. It isn’t even XML — you either have to handle it externally or create a complex recursive template which will split it and handle each token separately. That’s very bad.

An XML which uses random delimeters to create lists is no XML. It’s called CSV.

4. Thou shalt not allow insane structures

Even if you think noone will create an insane structure in your document, it’s not enough. Saying it’s disallowed on your awesome Wiki is not enough either. Forbid it if it’s supposed to be forbidden.

Otherwise, someone finally will use it. He or she will deliberately ignore your warning because it works. And even if they don’t, we will have to support it anyway in a compliant parser.

If you expect your data to be interchangeable with widely used formats, take a look at them. Don’t allow insane things which none of these formats do — or we’ll have to either refuse to convert some files, convert them incorrectly or waste our time writing complex blocks converting them to sane ones.

Simply, don’t do it. Even HTML doesn’t do that… well, that much.

5. Thou shalt write readable XML, not bytecode

The major point of using XML is that the data is both readable to machines and humans. Leave it that way. You have the whole human language at your disposal, so don’t write zeros, ones and other random numbers which are explained on your great Wiki.

Say, an attribute called type should actually name some type. Say, article can be some type. 1 usually ain’t. And if that type only describes width of indent, then name it so! Calling it a type is as useful as calling it a thing. Or some-other-thing and a-third-thing.

XML without human-readable text is no XML. Hell, even byte-compiled XML should have readable element names! That’s the whole point with it. Otherwise, you just end up developing another custom, useless format.

Posted in Uncategorized | Leave a comment

The suggested dependencies problem

Optional runtime dependencies (or «suggested dependencies») are one of the late problems we’re facing in Gentoo. There’s definitely a need for some standard solution, and it’d be great to put it in the next EAPI. Sadly, there’s no consensus how to solve it.

The optional dependencies problem

Gentoo has a very neat solution for handling optional dependencies and optional features, probably ever since the beginning. It is called «USE flags» and they work very well with the «traditional» optional dependencies. By that, I mean optional dependencies which are both build- and run-time.

Such a dependencies have to be pulled in before the build process starts, and usually require passing specific options in the configure phase. What’s important, both enabling and disabling features requires rebuilding the program in question because of code branches being switched. Thus, it’s perfectly fine if changing USE flags implies rebuilding the package.

Sadly, when it comes to optional runtime dependencies, USE flags are not a perfect solution. «Switching» such a dependency doesn’t require rebuilding the program anymore. It’s usually not even switching — the program can determine in runtime whether a particular dependency is available, and either enable or disable respective features. Simple like that.

If one decides to use USE flags for that, they become partially meaningless. Unless flags start stripping off the code (which is a bad idea), feature availability is dependency- rather than flag-based. So, USE=-ssl is irrelevant if, say, pyopenssl is installed. What’s even worse, flag imply needless rebuilding of such packages just to pull in an additional dependency.

The simple hack — pkg_postinst() messages

The simplest solution right now is just listing the suggested dependencies in pkg_postinst() messages. Combined with has_version helper, those messages can give a pretty nice output, pointing out already installed packages — just take a look at sys-apps/systemd ebuild.

Of course, it’s not a real solution, rather relying on user doing the hard work. The biggest disadvantage is that the dependencies are often going to end up in @world. And then, if user decides to unmerge our package, portage is unable to find and unmerge them as well.

The SDEPEND solution

A pretty common idea is to establish a new variable called SDEPEND (for «suggested»). Such a variable would simply list relevant dependencies, and let portage handle the UI part somehow. It is a minimalistic solution, quite consistent with other parts of PMS. Sadly, it has a few big shortcomings.

First, using our current dependency syntax, you can’t specify that a particular feature requires more than one package; in other words, that two or more suggested dependencies are supposed to be pulled in together. Of course, solving this one would be pretty easy — e.g. by allowing grouping them with parantheses.

A much more important issue is describing what particular dependencies do. Although sometimes this could be guessed by package descriptions pretty well, usually a more friendly text would be great. So, we end up having to implement that somehow.

And that’s usually when Ciaran comes in with ugly exherbism DEPENDENCIES. Sure, it solves most of the issues pointed out here but, hell, do we really want such a thing? Isn’t dependency syntax obscure enough already?

And it’s all rather dependency-oriented. In other words, package comes first, then goes the feature description. «Pass dev-python/pyopenssl or dev-python/python-gnutls to enable secure connections support». I don’t think that’s the most user friendly solution.

The USE flag solution

Another solution is brining a new category of USE flags. It’s not important whether they would be specified using a special variable, common USE_EXPAND or another magical features. In fact, that could be a thing totally separate from USE flags. The point is that some of the package flags would be runtime-switchable.

Unlike traditional USE flags, such flags wouldn’t be stored in vdb. They would be evaluated in place instead, using package.use or similar files, and the dependency tree would use current state of such flags. Of course, they would be allowed for RDEPEND (PDEPEND) use only.

Why reuse USE flags for that? Because it’s the most user-friendly solution. User doesn’t have to learn anything new. He/she enables a flag, does emerge -vDtN @world and notices that new dependencies are pulled in but the package doesn’t have to be rebuilt for that.

If we just add some additional magic for regular USE flags, enabling run-time dependant SSL support could be exactly the same as enabling build-time one — even using the same USE=ssl.

And we could basically even give «backwards» support for older EAPIs. Package managers not supporting the new feature would simply treat runtime-switchable USE flags as regular USE flags, requiring rebuilds of the package.

Posted in Gentoo | 11 Comments

Moving systemd into /usr — the technical side

Now that I think of it, I really regret I didn’t make systemd ebuild install it to /usr from the very beginning. But the harm has been done already, and I’d like to move it ASAP and that’s why I’d like to sum up problems with that and possible ways of proceeding with it.

The idea

The idea is simple as it is: move systemd install to /usr prefix completely. Right now, there are no technical benefits from keeping it in rootfs. It already depends on libdbus, which is installed in /usr, and I expect more dependencies over time. There’s no reason to move all those packages into rootfs.

Most importantly, the above information allows me to assume that such move won’t hurt our split-/usr users — because they already had to have /usr mounted for systemd to run.

The trouble

The main problem with the move is that unit files were installed into /lib location by a number of packages. The files can be moved into the new location cleanly only through rebuilding the packages which provide it. They need to stay in search path for systemd to work.

These unit files are symlinked from within /etc/systemd as well. Whenever we move a single unit, we need to update the symlink as well. I’d really like to avoid forcing users to manually fix that, and the eclass doesn’t export pkg_postinst() which could help doing it automatically.

The last problem is that people have the systemd location hardcoded into the kernel command-line. This one should be relatively easy to avoid as we can keep compatibility symlink for some time.

Solution 1: one big move

The simplest solution for the migration is to move all the relevant units in the systemd ebuild. This way, the unit integrity can be preserved, and symlinks could be updated at once as well. The risk of system breakage should be reduced to minimum.

However, there is an important disadvantage of that method. All those files would be moved out-of-scope of the Package Manager. Thus, after rebuild of every single package providing systemd units, all of our users will have to fight file collisions.

The same will likely apply to our new users, because they will have at least some units installed by random packages already. Users not ever intending to use systemd won’t be hurt because the move of unit files will be transparent to them as any other package file move.

Solution 2: temporary support for two locations

Right now, systemd already supports multiple locations for unit files. As a temporary hack, we could just add /lib/systemd/system to that list. This way, all unit files still installed in /lib will still work as expected when systemd is installed into /usr.

Sadly, this won’t handle updating /etc symlinks. I could, however, fix that easily by adding a simple .path unit or another solution updating symlinks as soon as files are removed from /lib.

Other solutions?

Well, I think the second one is the best we can do. Do you have any other ideas? I guess that udev could face a similar problem if we decide to actually move it into /usr. And there the thing is even worse because the rule install location is usually hardcoded into the ebuild or package build system itself; we will probably need to have even more degree of compatibility.

Posted in systemd | 5 Comments

Why there’s no IUSE=systemd, logrotate, bash-completion…

Next tides of users slowly notice that a number of unneeded files is installed on their systems. They enumerate systemd unit files, logrotate files, take their pitchforks and start their cruciates against Gentoo developers wasting their precious disk space.

Let me tell you a story. The story starts when Uncle Scarabeus wants to add bash-completion support into libreoffice ebuild. He considers this a minor addon, not worth the half a day necessary to rebuild libreoffice, so he doesn’t revbump it. He simply assumes the change will be propagated nicely when users upgrade to the next version.

Of course, some users will already come shouting here: that’s against the policy! Yes, indeed it is. But is it worth the hassle? Should all libreoffice users be forced to rebuild that huge package just to get a single tiny file installed? He could wait and add that along with the next version. Well, if he wouldn’t forget about it.

But that’s not really important part here. Because, to his surprise, many users have actually noticed the change. That’s because the use of bash-completion.eclass has caused the ebuild to have IUSE=bash-completion; and many of the --newuse Portage option users have rebuilt the package. A few others, like me, just stopped using that option.

That’s when the discussion started. We — the few devs actually caring about discussing — decided that it is quite pointless to control installing tiny files through USEflags. Of course, the libreoffice is an elephant-case here but so-called regular packages aren’t much better here. Is there really a reason to rebuild even 10 C files when the only thing going to change is a single, tiny text file being installed or not?

Another solution is to split those files into separate ebuilds. But that’s usually inconvenient both for users and devs. Users have to notice that they need to emerge an additional package to get the particular file installed, and devs need to maintain that additional package. That starts to become really ridiculous with files like systemd units which are often generated during build-time and store installation paths.

So what to use? INSTALL_MASK, obviously. It’s an ancient Portage magic which allows you to control which files will be punted from installed files. You can use app-portage/install-mask to quickly set it for the files you don’t want. It’s as simple as:

# install-mask -a systemd logrotate
Posted in Gentoo, PM utils | 10 Comments

Cleaning up /boot with eclean-kernel

# make install
sh /usr/src/linux-drm-next/arch/x86/boot/install.sh 3.2.0-rc2-pomiocik+ arch/x86/boot/bzImage \
		System.map "/boot"
cat: write error: No space left on device
make[1]: *** [install] Error 1
make: *** [install] Error 2

Ever hit an issue like that when trying to install a new kernel? Ever thought how much you hate manually removing old kernels? Ever forgot to remove modules as well? If you do, then you may be interested in a tiny new tool called eclean-kernel.

In simplest words, eclean-kernel could be called an old kernel harvester. Given a few tiny settings, it finds the kernels installed in the system, chooses ones to remove and removes them along with the modules and sources (if not used by any other kernel).

The usual way of using it is to set it to keep a few newest kernels, and remove all older than them. To do this, you just need to pass the number as --num. For example:

eclean-kernel -n 3 --ask

will remove all kernels but the newest three, asking before removing each one. For non-interactive use, omit --ask; you can also use --pretend for eclean-kernel to list the kernel versions which would be removed.

By default, it preserves all kernels referenced by the bootloader. The --destructive option can be used to disable that. If it fails to detect the correct bootloader (or multiple bootloaders are installed), --bootloader option can be used to specify the one to use.

It can also preserve some kinds of kernel files to fit more specific setups. For example, if you’d like to keep old kernel configs, just pass --exclude config.

And finally, you can avoid having to repeatedly provide the same options by putting them in your ~/.config/eclean-kernel.rc. An example file could look like:

-n 3 --ask --exclude config
Posted in Gentoo | 22 Comments