Tags in git

mini-post about using tags in git commits.

Usage

When you are looking at the history using git log having tags helps a lot digging out old commits, for example: you remember some commit added some timeout system in something related to the component foo.

git log --oneline | grep foo:

Would help figuring out the commit.

This usage is the best when working with not well structured codebase, since alternatively you can do

git log --oneline module/component

If you use separate directories for each module and component within the module.

PS: This is one of the reasons plaid focuses a lot on tags and I complain a lot when tags are not used.

Patches and Plaid

This is part of the better tools series.

Sometimes you should question the tools you are using and try to see if there is something better out there. Or build it yourself.

Juggling patches

It is quite common when interacting with people to send back and forth the changes to the shared codebase you are working on.

This post tries to analyze two commonly used models and explain why they can be improved and which are the good tools for it (existing or not).

The two models

The focus is on git, github-like web-mediated pull-requests and mailinglist-oriented workflows.

The tools in use are always:

a web browser
an editor
a shell
an email client

Some people might have all in one in a way or another making one of the two model already incredibly more effective. Below I assume you do not have such tightly integrated environments.

Pull requests

Github made quite easy to propose patches in the form of ephemeral branches that can be reviewed and merged with a single click on your browser.

The patchset can be part of your master tree or a brand new branch pushed on your repository for this purpose: first you push your changes on github and then you go to your browser to send the PullRequest (also known as merge request or proposed changeset).

You can get email notification that a pull request is available and then move to your browser to review it.

You might have a continuous integration report out of it and if you trust it you may skip fetching the changes and test them locally.

If something does not work exactly as it should you can notify the proponents and they might get an email that they have comments and they have to go to the browser to see them in detail.

Then the changes have to be pushed to the right branch and github helpfully updates it.

Then the reviewer has to get back to the browser and check again.

Once that is done you have your main tree with lots of merge artifacts and possibly some fun time if you want to bisect the history.

Mailing-list mediated

The mailing-list mediated is sort of popular because Linux does use it and git does provide tools for it out of box.

Once you have a set of patches (say 5) you are happy with you can simply issue

git send-email --compose -5 --to the_mailing@list.org

And if you have a local mailer working that’s it.

If you do not you end up having to configure it (e.g. configuring gmail with a specific access token not to have to type the password all the time is sort of easy)

The people in the mailing-list then receive your set in their mailbox as is and they can use git-am to test it (first saving the thread using their email client then using git am over it) locally and push to something like oracle if they like the set but they aren’t completely sure it won’t break everything.

If they have comments can just reply to the specific patch email (using the email Message-Id).

The proponent can then rework the set (maybe using git rebase -i) and send an update and add some comments here and there.

git send-email --annotate -6 --to the_mailing@list.org

Updates to specific patches or rework from other people can happen by just sending the patch back.

git send-email --annotate -1 --in-reply-to patch-msgid

Once the set is good, it can be applied to the tree, resulting in a purely linear history that makes going over looking for regression pretty easy.

Where to improve

Pull request based

The weak and the strong point of this method is its web-centricity.

It works quite nicely if you just use the web-mail so is just switching from a tab to another to see exactly what’s going on and reply in detail.

Yet, if your browser isn’t your shell (and you didn’t configure custom actions to auto-fetch the pull requests) you still have lots of back and forth.

Having already continuous integration hooks you can quickly configure is quite nice if the project has already a solid regression and code coverage harness so the reviewer bourden to make sure the code doesn’t break is lighter.

Sending a link to a pull request is easy.

Sadly, new code does not come with tests or tests you should trust the whole point above is half moot: you have to do the whole fetch&test dance.

Reworking sets isn’t exactly perfect, it makes quite hard to a third party to provide input in form of an alternate patch over a set:

you have to fetch the code being discussed
prepare a new pull request
reference it in your comment to the old one

then

the initial proponent has to fetch it
rebase his branch on it
update the pull request accordingly

and so on.

There are desktop-tools trying to bridge web and shell but right now they aren’t an incredible improvement and the churn during the review can be higher on the other side.

Surely is really HARD to forget a pull request open.

Mailing list based

The strong point of the approach is that you have less steps for the most common actions:

sending a set is a single command
fetching a set is two commands
doing a quick review does not require to switch to another application, you just
reply to the email you received.
sending an update or a different approach is always the same git send-email command

It is quite loose so people can have various degrees of integration, but in general the experience as reviewer is as good as your email client, your experience as proponent is as nice as your sendmail configuration.

People with basic email client would even have problems referring to patches by its Message-Id.

The weakest point of the method is the chance of missing a patch, leaving it either unreviewed or uncommitted after the review.

Ideal situation

My ideal solution would include:

Not many compulsory steps, sending a patch for a habitual contributor should take the least amount of time.
A pre-screening of patches, ideally making sure the new code has tests and it passes them on some testing environments.
Reviewing should take the least amount of time.
A mean to track patches and make easy to know if a set is still pending review or it is committed.

Enters plaid

I do enjoy better using the mailing-list approach since it is much quicker for me, I have a decent email client (that still could improve) and I know how to configure my local smtp. If I want to contribute to a new project that uses the approach it is just a matter to find the email address and type git send-email --annotate --to email, github gets unwieldy if I just want to send a couple of fixes.

That said I do see that the mailing-list shortcomings are a limiting factor and while I’m not much concerned as making the initial setup much easier (since federico has already plans for it), I do want to not lose patches and to get some of the nice and nifty features github has without losing the speed in development I do enjoy.

Plaid is my try to improve the situation, right now it is just more or less an easier to deploy patch tracker along the lines of patchwork with a diverging focus.

It emphasizes the concepts of patch tag to provide quick grouping, patch series to ease reviewing a set.

curl http://plaid.libav.org/project/libav/series/50/mbox | git am -s

Is all you need to get all the patches in your working tree.

Right now it works either as stand-alone tracker (right now this test deploy is fed by fetching from the mailing list archives) or as mailbox hook (as patchwork does).

Coming soon

I plan to make it act as postfix filter, so it injects in the email an useful link to the patch. It will provide a mean to send emails directly from it so it can doubles as nicer email client for those that are more web-centric and gets annoyed because gmail and the likes aren’t good for the purpose.

More views such as a per-submitter view and a search view will appear as well.

Document your project!

After discussing how to track your bugs and your contributions, let see what we have about documentation

Pain and documentation

An healthy Open Source project needs mainly contributors, contributors are usually your users. You get users if the project is known and useful. (and you do not have parasitic entities syphoning your work abusing git-merge, best luck to io.js and markdown-it to not have this experience, switching name is enough of a pain without it).

In order to gain mindshare, the best thing is making what you do easier to use and that requires documenting what you did! The process is usually boring, time consuming and every time you change something you have to make sure the documentation still matches reality.

In the opensource community we have multiple options on the kind of documentation we produce and how to produce.

Wiki

When you need to keep some structure, but you want to have an easy way to edit it wiki can be a good choice and it can lead to nice results. The information present is usually correct and if enough people keep editing it up to date.

Pros:

The wiki is quick to edit and you can have people contribute by just using a browser.
The documentation is indexed by the search engines easily
It can be restricted to a number of trusted people

Cons

The information is detached from the actual code and it could desync easily
Even if kept up to date, what applies to the current release is not what your poor user might have
Usually keeping versioned content is not that simple

Forum

Even if usually they are noisy forums are a good source of information plenty of time.
Personally I try to move interesting bits to a wiki page when I found something that is not completely transient.

Pros:

Usually everything require less developer interaction
User can share solutions to their problem effectively

Cons

The information can get stale even quicker that what you have in the wiki
Since it is mainly user-generate the solutions proposed might be suboptimal
Being highly interactive it requires more dedicated people to take care of unruly users

Manuals

There are lots of good toolchain to write full manuals as we have in Gentoo.

The old style xml/docbook tend to have a really steep learning curve, not to mention the even more quirky and wonderful monster such as LaTeX (and the lesser texinfo). ReStructuredText, asciidoc and some flavour of markdown seem to be a better tool for the task if you need speed and get contributors up to speed.

Pros:

A proper manual can be easily pinned to a specific release
It can be versioned using git
Some people still like something they can print and has a proper index

Cons

With the old tools it is a pain to start it
The learning curve can be still unbearable for most contributors
It requires some additional dedication to keep it up to date

What to use and why

Usually for small projects the manual is the README, once it grows usually a wiki is the best place to put notes from multiple people. If you are good at it a manual is a boon for all your users.

Tools to have documentation-in-code such as doxygen or docurium can help a lot if your project is having a single codebase.

If you need to unify a LOT of different information, like we have in Gentoo. The problems usually get much more annoying since you have contents written in multiple markups, living in multiple places and usually moving it from one place to another requires a serious editing effort (like moving from our guidexml to the current semantic wiki).

Markup suggestion

Markdown/CommonMark/Kramdown

I do like a lot CommonMark and I even started to port and extend it to be used in docutils since I find ReStructuredText too confusing for the normal users. The best quality of it is the natural flow, it is most annoying defect is that there are too many parser discrepancies and sometimes implementations disagrees. Still is better to have many good implementation than one subpar in everything (hi texinfo, I hate your toolchain).

Asciidoc

The markup is quite nice (up to a point) and the toolchain is sort of nice even if it feels like a Rube Goldberg machine. To my knowledge there is a single implementation of it and that makes me MUCH wary of using it in new projects.

ReStructuredText

The markup is not as intuitive as Asciidoc, thus quite far from Markdown immediate-use feeling, but it has great toolchain (if you like python) and it gets extended to produce lots of different well formatted documents.
It comes with loads markup features that Markdown core lacks: include directive, table of contents, pluggable generic block and span directives, 3 different flavours of tables.

Good if you can come to terms with its complexity all in all.

What’s next

Hopefully during this year among my many smaller and bigger projects, I’ll find time to put together something nice for documentation as well.

Track your issues

Issue Trackers

If you are not aware of a problem, you cannot fix it.

Having full awareness of the issues and managing it is the key of success for any kind of project (not just software).

For an open-source project it is essential that the issue tracker focuses on at least 3 areas:
Ease of use: You get reports mainly by casual users, they must spend the least amount of time to understand the tool and to provide the information.
Loudness: It must make problems easy to spot.
Data Mining: It should provide tools to query details, aggregate bugs and manipulate them.

What’s available

Right now I tried in different projects many issue trackers, sadly almost none fit the bill, they usually are actually the opposite: limited, cumbersome, hard to configure and horrible to use either to fill bugs or to actually manage them.

Bugzilla

It is by far the least bad, it has plugins to provide near-instant access thanks to Mozilla Persona, it has a rich rpc system that could be leveraged to have irc notifiers or side site statistics, importing-exporting data is almost there. As we know in Gentoo, it requires some deep manipulation and if there is nobody around to do that you can get fallouts like this when a single stubborn (and probably distracted) developer (vapier) manages to spoil the result of the goodwill of another and makes the Project overall more frail.

Mantis

It is still too rich of confusing option but its default splash views are a boon if you are wondering what’s the status of your project. No open-id/persona/single-sign-on integration sadly.

Redmine/Trac

Usually not good enough on the reporting side and, even if they are much simpler than Bugzilla, still not good for the untrained user. They integrate with the source repository view and knowledge base (aka wiki) so they can be a good starting point for small organizations.

Github/GitLab/Gogs

They have a more encompassing approach than redmine and trac, their issue tracker component is too simple in some cases (with Github not having even support for attachments and gogs not really managing tags yet) or a little too rough (no bug dependencies). But, with its immediate UI and the label-oriented approach, it is already pretty good for a large deal of projects. Sadly not Libav: we do need proper attachments.

RT

Request Tracker is overwhelming. No other words. Do not use it if you do not need to. It is too complex to configure on the admin side and is too annoying to use on the developer side. For users the interface is usually a mailbox so you can’t go wrong. Perfect if you have to manage a huge number of paying customer and you want to have detailed billing and other extremely advanced features.

Brimir

New kid of the block, it is quite simple, way too simple. Its mail rendering makes it not really great but is pretty much a nice concept waiting to bloom. (Will it?)

Suggestion welcome

Do you know any better opensource issue tracker? Please comment down =)