Category Archives: Uncategorized

GitHub — or how to re-centralize a DVCS

What are the most important advantages of git? I think one which should come out pretty early is that it is «distributed», or «decentralized». This simply means that the actual, complete repository moves along with the project. At some point, you could think: where it is hosted shouldn’t matter that much.

Although git doesn’t facilitate working completely without centralized server (because you need to find updates somewhere), it should be pretty clear that the repository content should be independent of the hosting service. In other words, hosting service should serve the repository, not enforce its contents.

I think all the madness started on Google Code project hosting. There, the project wikis were hosted as a subdirectory to svnroot (e.g. in gecko-mediaplayer sources). I’m not sure if I can say «it is wrong». On one hand, it’s bad to keep completely separate codebases in the same repository. On the other, the design of subversion is simply pure madness, and so everything in the repository follows it…

On the other hand, Google got git correctly. When a particular project decides to use git there, it gets three separate repositories (look at pkgcore sources for an example).

GitHub got wikis right as well. But GitHub Pages… They actually misuse branches in a horrible, messy way. Just look how to create project pages manually — they tell you to create an «orphan» branch!

In other words, they tell you to create two repositories in a single repository. Two independent histories. Complete madness! And whether you want it or not, you pull them with every single clone you do. Yes, that could be some kind of advantage but nevertheless it has nothing to do with the source code.

There’s also this old, ignored issue that they encourage you to rename your README file to their invented suffix just to have it rendered correctly. Once again, hosting services enforces the layout of your repository. And I get really angry getting all those README.md.bz2 in my docdir. This is all against the purpose of markup…

Shortly saying, the sole purpose of markup formats like Markdown, reStructuredText, asciidoc is to provide a complete markup on top of plain text. The text which sould be still completely usable for any regular text viewer. And this means that their naming should also follow the common text file naming rules, which means either uppercase names in *nix or .txt suffix in Windows. No custom .md, and certainly not .asciidoc!

It’s really sad that the very common git hosting sites, instead of encouraging people to use git correctly, force them to hack it around to achieve some minor madness.

A five commandments for XML format designers

If you’re designing an XML-based data format, then I beg you, please read the few following rules and obey them. XML may look easy, and even is easy but that doesn’t mean that writing a good one is. And if you’re going to invent second HTML, then please, just use JSON or any other random container. That will be easier for you, and easier for us.

1. Thou shalt always write a schema

Every XML format should be well described. And no, your ten-stanza poem is not enough. Complete, dedicated Wiki neither. These usually describe nicely (or less nicely) how to write your XML. That could be great if that’s all you’re interested in. But if that’s supposed to be some public format, there is one more important thing…

It’s called reading. Or parsing. Or just transforming. If you need to handle random XML files, coming from various sources, written by random people, you have to know what you can expect and what can you assume. It’s not enough to say what <x/> does — I need to know where it can appear and what I can find inside.

There are already well-deployed XML description formats such as DTD, Relax-NG or XML Schema. Please use one of them, I will be grateful. Not only they describe the format strictly and accurately but they also provide a very simple means to validate XML files. It’s helpful both to us, who parse it, and to people who actually write such XML.

An XML without spec is an XML where every element can appear anywhere in the document. In other words, it’s not even XML but an ugly tag soup.

2. Thy XML shalt be structured, not flat

XML provides means to create neat, hierarchical structures. Use them. If your documents consists of logical parts like sections or chapters, put their complete content in a single <section/> or <chapter>, or any other thing that may come into your head. That’s the correct way of doing that in XML.

Random headings and separators are not enough. Even if your spec says they always and definitely start a new section, that’s not enough. If you don’t believe us, try splitting that thing into parts yourself. Especially when you have sub-headings, sub-sub-headings and so on.

A flat-structured XML is no real XML. It’s just a text file with a few unnecessary elements.

3. Thou shalt split text into blocks using XML, not text delimeters

Even if you think that’ll make writing much easier, do not ever try to use simple character delimiters to split text into blocks. If you need a list, create a list of XML elements. Like the following:

<l>elem1</l>
<l>elem2</l>
<l>elem3</l>

And yes, I know elem1,elem2,elem3 is shorter and easier to type. But guess what — it’s hell to parse. It isn’t even XML — you either have to handle it externally or create a complex recursive template which will split it and handle each token separately. That’s very bad.

An XML which uses random delimeters to create lists is no XML. It’s called CSV.

4. Thou shalt not allow insane structures

Even if you think noone will create an insane structure in your document, it’s not enough. Saying it’s disallowed on your awesome Wiki is not enough either. Forbid it if it’s supposed to be forbidden.

Otherwise, someone finally will use it. He or she will deliberately ignore your warning because it works. And even if they don’t, we will have to support it anyway in a compliant parser.

If you expect your data to be interchangeable with widely used formats, take a look at them. Don’t allow insane things which none of these formats do — or we’ll have to either refuse to convert some files, convert them incorrectly or waste our time writing complex blocks converting them to sane ones.

Simply, don’t do it. Even HTML doesn’t do that… well, that much.

5. Thou shalt write readable XML, not bytecode

The major point of using XML is that the data is both readable to machines and humans. Leave it that way. You have the whole human language at your disposal, so don’t write zeros, ones and other random numbers which are explained on your great Wiki.

Say, an attribute called type should actually name some type. Say, article can be some type. 1 usually ain’t. And if that type only describes width of indent, then name it so! Calling it a type is as useful as calling it a thing. Or some-other-thing and a-third-thing.

XML without human-readable text is no XML. Hell, even byte-compiled XML should have readable element names! That’s the whole point with it. Otherwise, you just end up developing another custom, useless format.

Building Mozilla plugins without Mozilla

As of Firefox 6.0, Mozilla no longer supports building it against shared xulrunner. You may or may not already noticed that. For some users, this simply means that your next --depclean will unmerge last install of xulrunner. For some others, this means you will be building two copies of the same thing — one for Firefox, and the other for a few packages depending on xulrunner.

One especially painful case here are browser plugins. Right now, their ebuilds mostly depend on the whole xulrunner being built while that’s not exactly true for the packages itself. In fact, building Netscape plugins requires only headers to exist — no linkage is necessary, all necessary symbols are provided by the browser itself (and if they aren’t, the plugin probably won’t work anyway).

For all that, I really don’t see a reason to waste like 1 hour compiling an awfully large package just to use its headers for a few minutes and then have no real use for it. That’s why I decided to try establishing a tiny package containing headers and pkg-config files necessary to build plugins.

How packages build Mozilla plugins?

Most of browser plugins are actually clear NPAPI plugins. In simplest words that means that they need four standard, well-established np* headers to be built. These could be found, for example, in the npapi-sdk package.

And in fact, some projects actually bundle that four headers. That’s the best case because it means that a particular plugin has no xulrunner/mozilla dependency. It just builds the plugin against bundled headers and we don’t have to worry about supplying them to it.

When packages rely on external NPAPI headers, the pain begins. All these years, Mozilla upstream didn’t really establish a clear way of finding them. Each package has its own autoconf for it, less or more complex, and less or more wrong.

A good example here is VLC. It provides a quite painless method looking either for libxul or one of *-plugin pkg-config packages. Not sure, however, if most of names used there really existed but mozilla-plugin is the one most commonly used.

Either way, that test always nicely succeeds with xulrunner and lets VLC find its headers. If we establishing a tiny NPAPI header package, and just use mozilla-plugin pkg-config file in it, VLC will build fine against it. Sadly, if libxul is in the system, VLC will use it and link the plugin with it — for no reason.

gnash is another good example here. Although the code may look a little scary, it uses mozilla-plugin pkg-config and doesn’t link against anything.

On the other hand, gecko-mediaplayer is an awful example here. It’s using a lot of random pkg-config checks, and relies on features based on xulrunner pkg-config version. Mostly impossible to handle clearly; we need to inject additional, hacky pkg-config file to satisfy their checks — probably for no good reason.

IcedTea-web is a totally different case here. Unlike packages mentioned before, this one uses a larger set of xulrunner headers; though still requires no linkage. After building it against a number of xulrunner headers, the plugin works fine in Opera.

Creating the header package

Considering the above, a simple npapi-sdk package is not enough. We at least need to install mozilla-plugin.pc; installing libxul.pc satisfies configure checks on more packages but breaks VLC (as it tries to link with non-existent xulrunner libraries). If we hack the latter and remove Libs: from it, VLC builds fine.

Right now, I’m testing a simple mozilla-plugin-sdk package. It installs the complete set of xulrunner headers along with the two forementioned pkg-config files, satisfying all Mozilla plugins I’ve tried. Sadly, due to number of headers the package is awfully large.

The next step would be probably stripping unnecessary headers out of the package. I already started using makedepend to check which headers are actually used by Netscape plugins. Any further tips would be appreciated.