October 2014 – Luca Barbato

Fix ALL the BUGS!

Vittorio started (with some help from me) to fix all the issues pointed by Coverity.

Static analysis

Coverity (and scan-build) are quite useful to spot mistakes even if their false-positive ratio tend to be quite high. Even the false-positives are usually interesting since the spot code unnecessarily convoluted. The code should be as simple as possible but not simpler.

The basic idea behind those tools is to try to follow the code-paths while compiling them and spot what could go wrong (e.g. you are feeding a NULL to a function that would deference it).

The problems with this approach are usually two: false positive due to the limited scope of the analyzer and false negatives due shadowing.

False Positives

Coverity might assume certain inputs are valid even if they are made impossible by some initial checks up in the codeflow.

In those case you should spend enough time to make sure Coverity is not right and those faulty inputs aren’t slipping somewhere. NEVER try to just add some checks to the code pointed as first move, you might either hide issues (e.g. if Coverity complains about uninitialized variable do not just initialize it to nothing, check why it happens and if the logic behind is wrong).

If Coverity is confused, your compiler is confused as well and will produce suboptimal executables.
Properly fixing those issues can result in useful speedups. Simpler code is usually faster.

Ever increasing issue count

While fixing issues using those tools you might notice to your surprise that every time you fix something, something new appears out of thin air.

This is not magic but simply that the static analyzers usually keep some limit on how deep they go depending on the issues already present and how much time had been spent already.

That surprise had been fun since apparently some of the time limit is per compilation unit so splitting large files in smaller chunks gets us more results (while speeding up the building process thanks to better parallelism).

Usually fixing some high-impact issue gets us 3 or 5 new small impact issues.

I like solving puzzles so I do not mind having more fun, sadly I did not have much spare time to play this game lately.

Merge ALL the FIXES

Fixing properly all the issues is a lofty goal and as usual having a patch is just 1/2 of the work. Usually two set of eyes work better than one and an additional brain with different expertise can prevent a good chunk of mistakes. The review process is the other, sometimes neglected, half of solving issues.

So far about 100+ patches got piled up over the past weeks and now they are sent in small batches to ease the work of review. (I have something brewing to make reviewing simpler, as you might know)

During the review what probably about 1/10 of the patches will be rejected and the relative coverity report updated with enough information to explain why it is a false positive or the dangerous or strange behaviour pointed is intentional.

The next point release for our 4 maintained major releases: 0.8, 9, 10 and 11. Many thanks to the volunteers that spend their free time keeping all the branches up to date!

Tracking patches

You need good tools to do a good job.

Even the best tool in the hand of a novice is a club.

I’m quite fond in improving the tools I use. And that’s why I started getting involved in Gentoo, Libav, VLC and plenty of other projects.

I already discussed about lldb and asan/valgrind, now my current focus is about patch trackers. In part it is due to the current effort to improve the libav one,

Contributors

Before talking about patches and their tracking I’d digress a little on who produces them. The mythical Contributor: without contributions an opensource project would not exist.

You might have recurring contributions and unique/seldom contributions. Both are quite important.
In general you should make so seldom contributors become recurring contributors.

A recurring contributor can accept to spend some additional time to setup the environment to actually provide its contribution back to the community, a sporadic contributor could be easily put off if the effort required to send his patch is larger than writing the patch itself.

Th project maintainers should make so the life of contributors is as simple as possible.

Patches and Revision Control

Lately most opensource projects saw the light and started to use decentralized source revision control system and thanks to github and many other is the concept of issue pull requests is getting part of our culture and with it comes hopefully a wider acceptance to the fact that the code should be reviewed before it is merged.

Pull Request

In a decentralized development scenario new code is usually developed in topic branches, routinely rebased against the master until the set is ready and then the set of changes (called series or patchset) is reviewed and after some round of fixes eventually merged. Thanks to bitbucket now we have forking, spooning and knifing as part of the jargon.

The review (and merge) step, quite properly, is called knifing (or stabbing): you have to dice, slice and polish the code before merging it.

Reviewing code

During a review bugs are usually spotted as well way to improve are suggested. Patches might be split or merged together and the series reworked and improved a lot.

The process is usually time consuming, even more for an organization made of volunteer: writing code is fun, address issues spotted is not so much, review someone else code is much less even.

Sadly it is a necessary annoyance since otherwise the errors (and horrors) that would slip through would be much bigger and probably much more. If you do not care about code quality and what you are writing is not used by other people you can probably ignore that, if you feel somehow concerned that what you wrote might turn some people life in a sea of pain. (On the other hand some gratitude for such daunting effort is usually welcome).

Pull request management

The old fashioned way to issue a pull request is either poke somebody telling that your branch is ready for merge or just make a set of patches and mail them to whoever is in charge of integrating code to the main branch.

git provides a nifty tool to do that called git send-email and is quite common to send sets of patches (called usually series) to a mailing list. You get feedback by email and you can update the set using the --in-reply-to option and the message id.

Platforms such as github and similar are more web centric and require you to use the web interface to issue and review the request. No additional tools are required beside your git and a browser.

gerrit and reviewboard provide custom scripts to setup ephemeral branches in some staging area then the review process requires a browser again. Every commit gets some tool-specific metadata to ease tracking changes across series revisions. This approach the more setup intensive.

Pro and cons

Mailing list approach

Testing patches from the mailing list is quite simple thanks to git am. And if the reply-to field is used properly updates appear sorted in a good way.

This method is the simplest for the people used to have the email client always open and a console (if they are using a well configured emacs or vim they literally do not move away from the editor).

On the other hand, people using a webmail or using a basic email client might find the approach more cumbersome than a web based one.

If your only method to track contribution is just a mailing list, gets quite easy to forget which is the status of a set. Patches could be neglected and even who wrote them might forget for a long time.

Patchwork approach

Patchwork tracks which patches hit a mailing list and tries to figure out if they are eventually merged automatically.

It is quite basic: it provides an web interface to check the status and provides a mean to just update the patch status. The review must happen in the mailing list and there is no concept of series.

As basic as it is works as a reminder about pending patches but tends to get cluttered easily and keeping it clean requires some effort.

Github approach

The web interface makes much easier spot what is pending and what’s its status, people used to have everything in the browser (chrome and mozilla could be made to work as a decent IDE lately) might like it much better.

Reviewing small series or single patches is usually nicer but the current UIs do not scale for larger (5+) patchsets.

People not living in a browser find quite annoying switch context and it requires additional effort to contribute since you have to register to a website and the process of issuing a patch requires many additional steps while in the email approach just require to type git send-email -1.

Gerrit approach

The gerrit interfaces tend to be richer than the Github counterparts. That can be good or bad since they aren’t as immediate and tend to overwhelm new contributors.

You need to make an additional effort to setup your environment since you need some custom script.

The series are tracked with additional precision, but for all the practical usage is the same as github with the additional bourden for the contributor.

Introducing plaid

Plaid is my attempt to tackle the problem. It is currently unfinished and in dire need of more hands working on it.

It’s basic concept is to be non-intrusive as much as possible, retaining all the pros of the simple git+email workflow like patchwork does.

It provides already additional features such as the ability to manage series of patches and to track updates to it. It sports a view to get a break out of which series require a review and which are pending for a long time waiting for an update.

What’s pending is adding the ability to review it directly in the browser, send the review email for the web to the mailing list and a some more.

Probably I might complete it within the year or next spring, if you like Flask or python contributions are warmly welcome!

VDD14 Discussions: HWAccel2

I took part to the Videolan Dev Days 14 weeks ago, sadly I had been too busy so the posts about it will appear in scattered order and sort of delayed.

Hardware acceleration

In multimedia, video is basically crunching numbers and get pixels or crunching pixels and getting numbers. Most of the operation are quite time consuming on a general purpose CPU and orders of magnitude faster if done using DSP or hardware designed for that purpose.

Availability

Most of the commonly used system have video decoding and encoding capabilities either embedded in the GPU or in separated hardware. Leveraging it spares lots of cpu cycles and lots of battery if we are thinking about mobile.

Capabilities

The usually specialized hardware has the issue of being inflexible and that does clash with the fact most codec evolve quite quickly with additional profiles to extend its capabilities, support different color spaces, use additional encoding strategies and such. Software decoders and encoders are still needed and need badly.

Hardware acceleration support in Libav

HWAccel 1

The hardware acceleration support in Libav grew (like other eldritch-horror tentacular code we have lurking from our dark past) without much direction addressing short term problems and not really documenting how to use it.

As result all the people that dared to use it had to guess, usually used internal symbols that they wouldn’t have to use and all in all had to spend lots of time and
had enough grief when such internals changed.

Usage

Every backend required a quite large deal of boilerplate code to initialize the backend-specific context and to render the hardware surface wrapped in the AVFrame.

The Libav backend interface was quite vague in itself, requiring to override get_format and get_buffer in some ways.

Overall to get the whole thing working the library user was supposed to do about 75% of the work. Not really nice considering people uses libraries to abstract complexity and avoid repetition…

Backend support

As that support was written with just slice-based decoder in mind, it expects that all the backend would require the software decoder to parse the bitstream, prepare slices of the frame and feed the backend with them.

Sadly new backends appeared and they take directly either bitstream or full frames, the approach had been just to take the slice, add back the bitstream markers the backend library expects and be done with that.

Initial HWAccel 2 discussion

Last year since the number of backends I wanted to support were all bitstream-oriented and not fitting the mode at all I started thinking about it and the topic got discussed a bit during VDD 13. Some people that spent their dear time getting hwaccel1 working with their software were quite wary of radical changes so a path of incremental improvements got more or less put down.

HWAccel 1.2

default functions to allocate and free the backend context and make the struct to interface between Libav and the backend extensible without causing breakage.
avconv now can use some hwaccel, providing at least an example on how to use them and a mean to test without having to gut VLC or mpv to experiment.
document better the old-style hwaccels so at least some mistakes could be avoided (and some code that happen to work by sheer look won’t break once the faulty assuptions cease to exist)

The new VDA backend and the update VDPAU backend are examples of it.

HWAccel 1.3

extend the callback system to fit decently bitstream oriented backends.
provide an example of backend directly providing normal AVFrames.

The Intel QSV backend is used as a testbed for hwaccel 1.3.

The future of HWAccel2

Another year, another meeting. We sat down again to figure out how to get further closer to the end result of not having the casual users write boilerplate code to use hwaccel to get at least some performance boost and yet let the power users have the full access to the underpinnings so they can get most of it without having to write everything from scratch.

Simplified usage, hopefully really simple

The user just needs to use AVOption to set specific keys such as hwaccel and optionally hwaccel-device and the library will take care of everything. The frames returned by avcodec_decode_video2 will contain normal system memory and commonly used pixel formats. No further special code will be needed.

Advanced usage, now properly abstracted

All the default initialization, memory/surface allocation and such will remain overridable, with the difference that an additional callback called get_hw_surface will be introduced to separate completely the hwaccel path from the software path and specific functions to hand over the ownership of backend contexts and surfaces will be provided.

The software fallback won’t be anymore automagic in this case, but a specific AVERROR_INPUT_CHANGED will be returned so would be cleaner for the user reset the decoder without losing the display that maybe was sharing the same context. This leads the way to a simpler mean to support multiple hwaccel backends and fall back from one to the other to eventually the software decoding.

Migration path

We try our best to help people move to the new APIs.

Moving from HWAccel1 to HWAccel2 in general would result in less lines of code in the application, the people wanting to keep their callback need to just set them after avcodec_open2 and move the pixel specific get_buffer to get_hw_surface. The presence of av_hwaccel_hand_over_frame and av_hwaccel_hand_over_context will make much simpler managing the backend specific resources.

Expected Time of Arrival

Right now the review is on the HWaccel1.3, I hope to complete this step and add few new backends to test how good/bad that API is before adding the other steps. Probably HWAccel2 will take at least other 6 months.

Help in form of code or just moral support is always welcome!