Michał Górny

GitHub — or how to re-centralize a DVCS

What are the most important advantages of git? I think one which should come out pretty early is that it is «distributed», or «decentralized». This simply means that the actual, complete repository moves along with the project. At some point, you could think: where it is hosted shouldn’t matter that much.

Although git doesn’t facilitate working completely without centralized server (because you need to find updates somewhere), it should be pretty clear that the repository content should be independent of the hosting service. In other words, hosting service should serve the repository, not enforce its contents.

I think all the madness started on Google Code project hosting. There, the project wikis were hosted as a subdirectory to svnroot (e.g. in gecko-mediaplayer sources). I’m not sure if I can say «it is wrong». On one hand, it’s bad to keep completely separate codebases in the same repository. On the other, the design of subversion is simply pure madness, and so everything in the repository follows it…

On the other hand, Google got git correctly. When a particular project decides to use git there, it gets three separate repositories (look at pkgcore sources for an example).

GitHub got wikis right as well. But GitHub Pages… They actually misuse branches in a horrible, messy way. Just look how to create project pages manually — they tell you to create an «orphan» branch!

In other words, they tell you to create two repositories in a single repository. Two independent histories. Complete madness! And whether you want it or not, you pull them with every single clone you do. Yes, that could be some kind of advantage but nevertheless it has nothing to do with the source code.

There’s also this old, ignored issue that they encourage you to rename your README file to their invented suffix just to have it rendered correctly. Once again, hosting services enforces the layout of your repository. And I get really angry getting all those README.md.bz2 in my docdir. This is all against the purpose of markup…

Shortly saying, the sole purpose of markup formats like Markdown, reStructuredText, asciidoc is to provide a complete markup on top of plain text. The text which sould be still completely usable for any regular text viewer. And this means that their naming should also follow the common text file naming rules, which means either uppercase names in *nix or .txt suffix in Windows. No custom .md, and certainly not .asciidoc!

It’s really sad that the very common git hosting sites, instead of encouraging people to use git correctly, force them to hack it around to achieve some minor madness.

vim: smart C/C++ boilerplate templates

A simple vim scriptie for those who are interested. It is triggered when new C/C++ files are created (e.g. via vim new-file.cxx), and fills it in with boilerplate unit code. What’s special about it is that it tries to find tips about that code in other files in that or parent directory.

Continue reading “vim: smart C/C++ boilerplate templates”

A C API for C++ and Python ones — or making of libh2o

Lately I spent a lot of time working on a small project of mine called libh2o. Its goal is to provide a library of routines implementing IAPWS IF97 equations for water and steam properties. With the core written in C, and providing a nice-to-use API for C++ and Python.

At first, I thought about not providing a «high level» C API at all. It was like: if you want to use plain C, you’ve gotta glue all the low-level equations yourself. However, after some thinking I decided to provide one, and built the two remaining APIs (C++ and Python) on top of it.

The main reason for doing this was that Python (well, CPython) is written in C. Although I’ve seen people writing Python extensions in C++, and even using some of C++ features to make them a little nicer, that’s still a bunch of ugly C hacks and pointer casts. I don’t see a really good reason to write a Python extension in C++, nor to make it depend on a C++ compiler when it’s all limited to C-based CPython API anyway.

And that means that I have either to duplicate all the high-level logic in the Python extension, or just create a C API first and reuse that. Since the whole logic was simple enough to be covered completely and clearly in C, I have chosen this way.

As it happens when people choose C, I had to implement some kind of poor man’s objectivity. Not something as wide (and ugly) as GObject (someone, please kill it!) but a few bits necessary to keep the state. In other words, a structure keeping the «object» and a bunch of nicely named functions taking it as their first argument.

Before I learnt C++, I would assume that the object structure should be a private (and obscured) blob, and the object type should be an incomplete pointer to it. User should just grab that pointer from a «constructor», pass it around and finally free it through a «destructor». Advantage: the exact struct contents are not the part of ABI.

But now I’ve decided to go the other way; way similar to how C++ classes work. I’ve created a structure with explicitly listed private fields (and a very simple /*private:*/ comment), and used that as the public type. It doesn’t need to keep any memory allocated, and is simple enough to be allocated on stack. Advantages: no need for a destructor, and an ability to pack that struct in the C++ class which will wrap it.

Then the usual stuff: a bunch of functions with common prefixes. One prefix for the «namespace», another one for the function (new, get…). All in nice and clear fashion, either to be used directly or wrapped in the C++ or Python APIs.

Five commandments for XML format designers

If you’re designing an XML-based data format, then I beg you, please read the few following rules and obey them. XML may look easy, and even is easy but that doesn’t mean that writing a good one is. And if you’re going to invent second HTML, then please, just use JSON or any other random container. That will be easier for you, and easier for us.

Continue reading “Five commandments for XML format designers”

The suggested dependencies problem

Optional runtime dependencies (or «suggested dependencies») are one of the late problems we’re facing in Gentoo. There’s definitely a need for some standard solution, and it’d be great to put it in the next EAPI. Sadly, there’s no consensus how to solve it.

Continue reading “The suggested dependencies problem”