Tags in git

mini-post about using tags in git commits.

Usage

When you are looking at the history using git log having tags helps a lot digging out old commits, for example: you remember some commit added some timeout system in something related to the component foo.

git log --oneline | grep foo:

Would help figuring out the commit.

This usage is the best when working with not well structured codebase, since alternatively you can do

git log --oneline module/component

If you use separate directories for each module and component within the module.

PS: This is one of the reasons plaid focuses a lot on tags and I complain a lot when tags are not used.

Patches and Plaid

This is part of the better tools series.

Sometimes you should question the tools you are using and try to see if there is something better out there. Or build it yourself.

Juggling patches

It is quite common when interacting with people to send back and forth the changes to the shared codebase you are working on.

This post tries to analyze two commonly used models and explain why they can be improved and which are the good tools for it (existing or not).

The two models

The focus is on git, github-like web-mediated pull-requests and mailinglist-oriented workflows.

The tools in use are always:

a web browser
an editor
a shell
an email client

Some people might have all in one in a way or another making one of the two model already incredibly more effective. Below I assume you do not have such tightly integrated environments.

Pull requests

Github made quite easy to propose patches in the form of ephemeral branches that can be reviewed and merged with a single click on your browser.

The patchset can be part of your master tree or a brand new branch pushed on your repository for this purpose: first you push your changes on github and then you go to your browser to send the PullRequest (also known as merge request or proposed changeset).

You can get email notification that a pull request is available and then move to your browser to review it.

You might have a continuous integration report out of it and if you trust it you may skip fetching the changes and test them locally.

If something does not work exactly as it should you can notify the proponents and they might get an email that they have comments and they have to go to the browser to see them in detail.

Then the changes have to be pushed to the right branch and github helpfully updates it.

Then the reviewer has to get back to the browser and check again.

Once that is done you have your main tree with lots of merge artifacts and possibly some fun time if you want to bisect the history.

Mailing-list mediated

The mailing-list mediated is sort of popular because Linux does use it and git does provide tools for it out of box.

Once you have a set of patches (say 5) you are happy with you can simply issue

git send-email --compose -5 --to the_mailing@list.org

And if you have a local mailer working that’s it.

If you do not you end up having to configure it (e.g. configuring gmail with a specific access token not to have to type the password all the time is sort of easy)

The people in the mailing-list then receive your set in their mailbox as is and they can use git-am to test it (first saving the thread using their email client then using git am over it) locally and push to something like oracle if they like the set but they aren’t completely sure it won’t break everything.

If they have comments can just reply to the specific patch email (using the email Message-Id).

The proponent can then rework the set (maybe using git rebase -i) and send an update and add some comments here and there.

git send-email --annotate -6 --to the_mailing@list.org

Updates to specific patches or rework from other people can happen by just sending the patch back.

git send-email --annotate -1 --in-reply-to patch-msgid

Once the set is good, it can be applied to the tree, resulting in a purely linear history that makes going over looking for regression pretty easy.

Where to improve

Pull request based

The weak and the strong point of this method is its web-centricity.

It works quite nicely if you just use the web-mail so is just switching from a tab to another to see exactly what’s going on and reply in detail.

Yet, if your browser isn’t your shell (and you didn’t configure custom actions to auto-fetch the pull requests) you still have lots of back and forth.

Having already continuous integration hooks you can quickly configure is quite nice if the project has already a solid regression and code coverage harness so the reviewer bourden to make sure the code doesn’t break is lighter.

Sending a link to a pull request is easy.

Sadly, new code does not come with tests or tests you should trust the whole point above is half moot: you have to do the whole fetch&test dance.

Reworking sets isn’t exactly perfect, it makes quite hard to a third party to provide input in form of an alternate patch over a set:

you have to fetch the code being discussed
prepare a new pull request
reference it in your comment to the old one

then

the initial proponent has to fetch it
rebase his branch on it
update the pull request accordingly

and so on.

There are desktop-tools trying to bridge web and shell but right now they aren’t an incredible improvement and the churn during the review can be higher on the other side.

Surely is really HARD to forget a pull request open.

Mailing list based

The strong point of the approach is that you have less steps for the most common actions:

sending a set is a single command
fetching a set is two commands
doing a quick review does not require to switch to another application, you just
reply to the email you received.
sending an update or a different approach is always the same git send-email command

It is quite loose so people can have various degrees of integration, but in general the experience as reviewer is as good as your email client, your experience as proponent is as nice as your sendmail configuration.

People with basic email client would even have problems referring to patches by its Message-Id.

The weakest point of the method is the chance of missing a patch, leaving it either unreviewed or uncommitted after the review.

Ideal situation

My ideal solution would include:

Not many compulsory steps, sending a patch for a habitual contributor should take the least amount of time.
A pre-screening of patches, ideally making sure the new code has tests and it passes them on some testing environments.
Reviewing should take the least amount of time.
A mean to track patches and make easy to know if a set is still pending review or it is committed.

Enters plaid

I do enjoy better using the mailing-list approach since it is much quicker for me, I have a decent email client (that still could improve) and I know how to configure my local smtp. If I want to contribute to a new project that uses the approach it is just a matter to find the email address and type git send-email --annotate --to email, github gets unwieldy if I just want to send a couple of fixes.

That said I do see that the mailing-list shortcomings are a limiting factor and while I’m not much concerned as making the initial setup much easier (since federico has already plans for it), I do want to not lose patches and to get some of the nice and nifty features github has without losing the speed in development I do enjoy.

Plaid is my try to improve the situation, right now it is just more or less an easier to deploy patch tracker along the lines of patchwork with a diverging focus.

It emphasizes the concepts of patch tag to provide quick grouping, patch series to ease reviewing a set.

curl http://plaid.libav.org/project/libav/series/50/mbox | git am -s

Is all you need to get all the patches in your working tree.

Right now it works either as stand-alone tracker (right now this test deploy is fed by fetching from the mailing list archives) or as mailbox hook (as patchwork does).

Coming soon

I plan to make it act as postfix filter, so it injects in the email an useful link to the patch. It will provide a mean to send emails directly from it so it can doubles as nicer email client for those that are more web-centric and gets annoyed because gmail and the likes aren’t good for the purpose.

More views such as a per-submitter view and a search view will appear as well.

Document your project!

After discussing how to track your bugs and your contributions, let see what we have about documentation

Pain and documentation

An healthy Open Source project needs mainly contributors, contributors are usually your users. You get users if the project is known and useful. (and you do not have parasitic entities syphoning your work abusing git-merge, best luck to io.js and markdown-it to not have this experience, switching name is enough of a pain without it).

In order to gain mindshare, the best thing is making what you do easier to use and that requires documenting what you did! The process is usually boring, time consuming and every time you change something you have to make sure the documentation still matches reality.

In the opensource community we have multiple options on the kind of documentation we produce and how to produce.

Wiki

When you need to keep some structure, but you want to have an easy way to edit it wiki can be a good choice and it can lead to nice results. The information present is usually correct and if enough people keep editing it up to date.

Pros:

The wiki is quick to edit and you can have people contribute by just using a browser.
The documentation is indexed by the search engines easily
It can be restricted to a number of trusted people

Cons

The information is detached from the actual code and it could desync easily
Even if kept up to date, what applies to the current release is not what your poor user might have
Usually keeping versioned content is not that simple

Forum

Even if usually they are noisy forums are a good source of information plenty of time.
Personally I try to move interesting bits to a wiki page when I found something that is not completely transient.

Pros:

Usually everything require less developer interaction
User can share solutions to their problem effectively

Cons

The information can get stale even quicker that what you have in the wiki
Since it is mainly user-generate the solutions proposed might be suboptimal
Being highly interactive it requires more dedicated people to take care of unruly users

Manuals

There are lots of good toolchain to write full manuals as we have in Gentoo.

The old style xml/docbook tend to have a really steep learning curve, not to mention the even more quirky and wonderful monster such as LaTeX (and the lesser texinfo). ReStructuredText, asciidoc and some flavour of markdown seem to be a better tool for the task if you need speed and get contributors up to speed.

Pros:

A proper manual can be easily pinned to a specific release
It can be versioned using git
Some people still like something they can print and has a proper index

Cons

With the old tools it is a pain to start it
The learning curve can be still unbearable for most contributors
It requires some additional dedication to keep it up to date

What to use and why

Usually for small projects the manual is the README, once it grows usually a wiki is the best place to put notes from multiple people. If you are good at it a manual is a boon for all your users.

Tools to have documentation-in-code such as doxygen or docurium can help a lot if your project is having a single codebase.

If you need to unify a LOT of different information, like we have in Gentoo. The problems usually get much more annoying since you have contents written in multiple markups, living in multiple places and usually moving it from one place to another requires a serious editing effort (like moving from our guidexml to the current semantic wiki).

Markup suggestion

Markdown/CommonMark/Kramdown

I do like a lot CommonMark and I even started to port and extend it to be used in docutils since I find ReStructuredText too confusing for the normal users. The best quality of it is the natural flow, it is most annoying defect is that there are too many parser discrepancies and sometimes implementations disagrees. Still is better to have many good implementation than one subpar in everything (hi texinfo, I hate your toolchain).

Asciidoc

The markup is quite nice (up to a point) and the toolchain is sort of nice even if it feels like a Rube Goldberg machine. To my knowledge there is a single implementation of it and that makes me MUCH wary of using it in new projects.

ReStructuredText

The markup is not as intuitive as Asciidoc, thus quite far from Markdown immediate-use feeling, but it has great toolchain (if you like python) and it gets extended to produce lots of different well formatted documents.
It comes with loads markup features that Markdown core lacks: include directive, table of contents, pluggable generic block and span directives, 3 different flavours of tables.

Good if you can come to terms with its complexity all in all.

What’s next

Hopefully during this year among my many smaller and bigger projects, I’ll find time to put together something nice for documentation as well.

Track your issues

Issue Trackers

If you are not aware of a problem, you cannot fix it.

Having full awareness of the issues and managing it is the key of success for any kind of project (not just software).

For an open-source project it is essential that the issue tracker focuses on at least 3 areas:
Ease of use: You get reports mainly by casual users, they must spend the least amount of time to understand the tool and to provide the information.
Loudness: It must make problems easy to spot.
Data Mining: It should provide tools to query details, aggregate bugs and manipulate them.

What’s available

Right now I tried in different projects many issue trackers, sadly almost none fit the bill, they usually are actually the opposite: limited, cumbersome, hard to configure and horrible to use either to fill bugs or to actually manage them.

Bugzilla

It is by far the least bad, it has plugins to provide near-instant access thanks to Mozilla Persona, it has a rich rpc system that could be leveraged to have irc notifiers or side site statistics, importing-exporting data is almost there. As we know in Gentoo, it requires some deep manipulation and if there is nobody around to do that you can get fallouts like this when a single stubborn (and probably distracted) developer (vapier) manages to spoil the result of the goodwill of another and makes the Project overall more frail.

Mantis

It is still too rich of confusing option but its default splash views are a boon if you are wondering what’s the status of your project. No open-id/persona/single-sign-on integration sadly.

Redmine/Trac

Usually not good enough on the reporting side and, even if they are much simpler than Bugzilla, still not good for the untrained user. They integrate with the source repository view and knowledge base (aka wiki) so they can be a good starting point for small organizations.

Github/GitLab/Gogs

They have a more encompassing approach than redmine and trac, their issue tracker component is too simple in some cases (with Github not having even support for attachments and gogs not really managing tags yet) or a little too rough (no bug dependencies). But, with its immediate UI and the label-oriented approach, it is already pretty good for a large deal of projects. Sadly not Libav: we do need proper attachments.

RT

Request Tracker is overwhelming. No other words. Do not use it if you do not need to. It is too complex to configure on the admin side and is too annoying to use on the developer side. For users the interface is usually a mailbox so you can’t go wrong. Perfect if you have to manage a huge number of paying customer and you want to have detailed billing and other extremely advanced features.

Brimir

New kid of the block, it is quite simple, way too simple. Its mail rendering makes it not really great but is pretty much a nice concept waiting to bloom. (Will it?)

Suggestion welcome

Do you know any better opensource issue tracker? Please comment down =)

Fix ALL the BUGS!

Vittorio started (with some help from me) to fix all the issues pointed by Coverity.

Static analysis

Coverity (and scan-build) are quite useful to spot mistakes even if their false-positive ratio tend to be quite high. Even the false-positives are usually interesting since the spot code unnecessarily convoluted. The code should be as simple as possible but not simpler.

The basic idea behind those tools is to try to follow the code-paths while compiling them and spot what could go wrong (e.g. you are feeding a NULL to a function that would deference it).

The problems with this approach are usually two: false positive due to the limited scope of the analyzer and false negatives due shadowing.

False Positives

Coverity might assume certain inputs are valid even if they are made impossible by some initial checks up in the codeflow.

In those case you should spend enough time to make sure Coverity is not right and those faulty inputs aren’t slipping somewhere. NEVER try to just add some checks to the code pointed as first move, you might either hide issues (e.g. if Coverity complains about uninitialized variable do not just initialize it to nothing, check why it happens and if the logic behind is wrong).

If Coverity is confused, your compiler is confused as well and will produce suboptimal executables.
Properly fixing those issues can result in useful speedups. Simpler code is usually faster.

Ever increasing issue count

While fixing issues using those tools you might notice to your surprise that every time you fix something, something new appears out of thin air.

This is not magic but simply that the static analyzers usually keep some limit on how deep they go depending on the issues already present and how much time had been spent already.

That surprise had been fun since apparently some of the time limit is per compilation unit so splitting large files in smaller chunks gets us more results (while speeding up the building process thanks to better parallelism).

Usually fixing some high-impact issue gets us 3 or 5 new small impact issues.

I like solving puzzles so I do not mind having more fun, sadly I did not have much spare time to play this game lately.

Merge ALL the FIXES

Fixing properly all the issues is a lofty goal and as usual having a patch is just 1/2 of the work. Usually two set of eyes work better than one and an additional brain with different expertise can prevent a good chunk of mistakes. The review process is the other, sometimes neglected, half of solving issues.

So far about 100+ patches got piled up over the past weeks and now they are sent in small batches to ease the work of review. (I have something brewing to make reviewing simpler, as you might know)

During the review what probably about 1/10 of the patches will be rejected and the relative coverity report updated with enough information to explain why it is a false positive or the dangerous or strange behaviour pointed is intentional.

The next point release for our 4 maintained major releases: 0.8, 9, 10 and 11. Many thanks to the volunteers that spend their free time keeping all the branches up to date!

Tracking patches

You need good tools to do a good job.

Even the best tool in the hand of a novice is a club.

I’m quite fond in improving the tools I use. And that’s why I started getting involved in Gentoo, Libav, VLC and plenty of other projects.

I already discussed about lldb and asan/valgrind, now my current focus is about patch trackers. In part it is due to the current effort to improve the libav one,

Contributors

Before talking about patches and their tracking I’d digress a little on who produces them. The mythical Contributor: without contributions an opensource project would not exist.

You might have recurring contributions and unique/seldom contributions. Both are quite important.
In general you should make so seldom contributors become recurring contributors.

A recurring contributor can accept to spend some additional time to setup the environment to actually provide its contribution back to the community, a sporadic contributor could be easily put off if the effort required to send his patch is larger than writing the patch itself.

Th project maintainers should make so the life of contributors is as simple as possible.

Patches and Revision Control

Lately most opensource projects saw the light and started to use decentralized source revision control system and thanks to github and many other is the concept of issue pull requests is getting part of our culture and with it comes hopefully a wider acceptance to the fact that the code should be reviewed before it is merged.

Pull Request

In a decentralized development scenario new code is usually developed in topic branches, routinely rebased against the master until the set is ready and then the set of changes (called series or patchset) is reviewed and after some round of fixes eventually merged. Thanks to bitbucket now we have forking, spooning and knifing as part of the jargon.

The review (and merge) step, quite properly, is called knifing (or stabbing): you have to dice, slice and polish the code before merging it.

Reviewing code

During a review bugs are usually spotted as well way to improve are suggested. Patches might be split or merged together and the series reworked and improved a lot.

The process is usually time consuming, even more for an organization made of volunteer: writing code is fun, address issues spotted is not so much, review someone else code is much less even.

Sadly it is a necessary annoyance since otherwise the errors (and horrors) that would slip through would be much bigger and probably much more. If you do not care about code quality and what you are writing is not used by other people you can probably ignore that, if you feel somehow concerned that what you wrote might turn some people life in a sea of pain. (On the other hand some gratitude for such daunting effort is usually welcome).

Pull request management

The old fashioned way to issue a pull request is either poke somebody telling that your branch is ready for merge or just make a set of patches and mail them to whoever is in charge of integrating code to the main branch.

git provides a nifty tool to do that called git send-email and is quite common to send sets of patches (called usually series) to a mailing list. You get feedback by email and you can update the set using the --in-reply-to option and the message id.

Platforms such as github and similar are more web centric and require you to use the web interface to issue and review the request. No additional tools are required beside your git and a browser.

gerrit and reviewboard provide custom scripts to setup ephemeral branches in some staging area then the review process requires a browser again. Every commit gets some tool-specific metadata to ease tracking changes across series revisions. This approach the more setup intensive.

Pro and cons

Mailing list approach

Testing patches from the mailing list is quite simple thanks to git am. And if the reply-to field is used properly updates appear sorted in a good way.

This method is the simplest for the people used to have the email client always open and a console (if they are using a well configured emacs or vim they literally do not move away from the editor).

On the other hand, people using a webmail or using a basic email client might find the approach more cumbersome than a web based one.

If your only method to track contribution is just a mailing list, gets quite easy to forget which is the status of a set. Patches could be neglected and even who wrote them might forget for a long time.

Patchwork approach

Patchwork tracks which patches hit a mailing list and tries to figure out if they are eventually merged automatically.

It is quite basic: it provides an web interface to check the status and provides a mean to just update the patch status. The review must happen in the mailing list and there is no concept of series.

As basic as it is works as a reminder about pending patches but tends to get cluttered easily and keeping it clean requires some effort.

Github approach

The web interface makes much easier spot what is pending and what’s its status, people used to have everything in the browser (chrome and mozilla could be made to work as a decent IDE lately) might like it much better.

Reviewing small series or single patches is usually nicer but the current UIs do not scale for larger (5+) patchsets.

People not living in a browser find quite annoying switch context and it requires additional effort to contribute since you have to register to a website and the process of issuing a patch requires many additional steps while in the email approach just require to type git send-email -1.

Gerrit approach

The gerrit interfaces tend to be richer than the Github counterparts. That can be good or bad since they aren’t as immediate and tend to overwhelm new contributors.

You need to make an additional effort to setup your environment since you need some custom script.

The series are tracked with additional precision, but for all the practical usage is the same as github with the additional bourden for the contributor.

Introducing plaid

Plaid is my attempt to tackle the problem. It is currently unfinished and in dire need of more hands working on it.

It’s basic concept is to be non-intrusive as much as possible, retaining all the pros of the simple git+email workflow like patchwork does.

It provides already additional features such as the ability to manage series of patches and to track updates to it. It sports a view to get a break out of which series require a review and which are pending for a long time waiting for an update.

What’s pending is adding the ability to review it directly in the browser, send the review email for the web to the mailing list and a some more.

Probably I might complete it within the year or next spring, if you like Flask or python contributions are warmly welcome!