Multimedia – Page 3 – Luca Barbato

Again on assert()

Since apparently there are still people not reading the fine man page.

If the macro NDEBUG was defined at the moment was last included, the macro assert() generates no code, and hence does nothing at all.
Otherwise, the macro assert() prints an error message to standard error and terminates the program by calling abort(3) if expression is false (i.e., compares equal to zero).
The purpose of this macro is to help the programmer find bugs in his program. The message “assertion failed in file foo.c, function do_bar(), line 1287” is of no help at all to a user.

I guess it is time to return on security and expand a bit which are good practices and which are misguided ideas that should be eradicated to reduce the amount of Deny Of Service waiting to happen.

Security issues

The term “Security issue” covers a lot of different kind of situations. Usually unhanded paths in the code lead to memory corruption, memory leaks, crashes and other less evident problems such as information leaks.

I’m focusing on crashes today, assume the others are usually more annoying or dangerous, it might be true or not depending on the scenarios:

If you are watching a movie and you have a glitch in the bitstream that makes the application leak some memory you would not care at all as long you can enjoy your movie. If the same glitch makes VLC to close suddenly a second before you get to see who is the mastermind behind a really twisted plot… I guess you’ll scream at whoever thought was a good idea to crash there.

If a glitch might get an attacker to run arbitrary code while you are watching your movie probably you’d like better to have your player to just crash instead.

It is a false dichotomy since what you want is to have the glitch handled properly, and keep watching the rest of the movie w/out having VLC crashing w/out any meaningful information for you to know.

Errors must be handled, trading a crash for something else you consider worse is just being naive.

What is assert exactly?

assert is a debugging facility mandated by POSIX and C89 and C99, it is a macro that more or less looks like this

#define assert()                                       \
    if (condition) {                                   \
        do_nothing();                                  \
    } else {                                           \
       fprintf(stderr, "%s %s", __LINE__, __func__);   \
       abort();                                        \
    }

If the condition does not happen crash, here the real-life version from musl

#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))

How to use it

Assert should be use to verify assumptions. While developing they help you to verify if your
assumptions meet reality. If not they tell you that should investigate because something is
clearly wrong. They are not intended to be used in release builds.
– some wise Federico while talking about another language asserts

Usually when you write some code you might do something like this to make sure you aren’t doing anything wrong, you start with

int my_function_doing_difficult_computations(Structure *s)
{
   a = some_computation(s);
   ....
   b = other_operations(a, s);
   ....
   c = some_input(s, b);
   ...
   idx = some_operation(a, b, c);

   return some_lut[idx];
}

Where idx in a signed integer, and so a, b, c are with some ranges that might or not depend on some external input.

You do not want to have idx to be outside the range of the lookup table array some_lut and you are not so sure. How to check that you aren’t getting outside the array?

When you write the code usually you iteratively improve a prototype, you can add tests to make sure every function is returning values within the expected range and you can use assert() as a poor-man C version of proper unit-testing.

If some function depends on values outside your control (e.g. an input file), you usually do validation over them and cleanly error out there. Leaving external inputs unaccounted or, even worse, put an assert() there is really bad.

Unit testing and assert()

We want to make sure our function works fine, let’s make a really tiny test.

void test_some_computation(void)
{
    Structure *s = NULL;
    int i;
    while (input_generator(&s, i)) {
       int a = some_computation(s);
       assert(a > 0 && a <10);
    }
}

It is compact and you can then run your test under gdb and inspect a bit around. Quite good if you are refactoring the innards of some_computation() and you want to be sure you did not consider some corner case.

Here assert() is quite nice since we can pack in a single line the testcase and have a simple report if something went wrong. We could do better since assert does not tell use the value or how we ended up there though.

You might not be that thorough and you can just decide to put the same assert in your function and check there, assuming you cover all the input space properly using regression tests.

To crash or not to crash

The people that consider OK crashing on runtime (remember the sad user that cannot watch his wonderful movie till the end?) suggest to leave the assert enabled at runtime.

If you consider the example above, would be better to crash than to read a random value from the memory? Again this is a false dichotomy!

You can expect failures, e.g. broken bitstreams and you want to just check and return a proper failure message.

In our case some_input() return value should be checked for failures and the return value forwarder further up till the library user that then will decide what to do.

Now remains the access to the lookup table. If you didn’t check sufficiently the other functions you might get a bogus index and if you get a bogus index you will read from random memory (crashing or not depending if the random memory is on an address mapped to the program outside). Do you want to have an assert() there? Or you’d rather ad another normal check with a normal failure path?

An correct answer is to test your code enough so you do not need to add yet another check and, in fact, if the problem arises is wrong to add a check there, or, even worse an assert(), you should just go up in the execution path and fix the problem where it is: a non validated input, a wrong “optimization” or something sillier.

There is open debate on if having assert() enabled is a good or bad practice when talking about defensive design. In C, in my opinion, it is a complete misuse. You if you want to litter your release code with tons of branches you can also spend time to implement something better and make sure to clean up correctly. Calling abort() leaves your input and output possibly in severely inconsistent state.

How to use it the wrong way

I want to trade a crash anytime the alternative is memory corruption
– some misguided guy

Assume you have something like that

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);


while (some_related_computations(s)) {
   do_stuff_(s, p);
   p += 4;
}

assert(p - buf == size);

If some_computation() and some_related_computation(s) do not agree, you might write over the allocated buffer! The naive person above starts talking about how the memory is corrupted by do_stuff() and horrible things (e.g. foreign code execution) could happen without the assert() and how even calling return at that point is terrible and would lead to horrible horrible things.

Ok, NO. Stop NOW. Go up and look at how assert is implemented. If you check at that point that something went wrong, you have the corruption already. No matter what you do, somebody could exploit it depending on how naive you had been or unlucky.

Remember: assert() does do I/O, allocates memory, raises a signal and calls functions. All that you would rather not do when your memory is corrupted is done by assert().

You can be less naive.

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);

while (some_related_computations(s) && size > 4) {
   do_stuff_(s, p);
   p    += 4;
   size -= 4;
}
assert(size != 0);

But then, instead of the assert you can just add

if (size != 0) {
    msg("Something went really wrong!");
    log("The state is %p", s->some_state);
    cleanup(s);
    goto fail;
}

This way when the “impossible” happens the user gets a proper notification and you can recover cleanly and no memory corruption ever happened.

Better than assert

Albeit being easy to use and portable assert() does not provide that much information, there are plenty of tools that can be leveraged to get better reporting.

Gdb has a some capabilities that you might enjoy.
valgrind can be used in non obvious ways.
asan can get pretty handy with __asan_describe_address.

In Closing

assert() is a really nice debugging tool and it helps a lot to make sure some state remains invariant while refactoring.

Leaving asserts in release code, on the other hand, is quite wrong, it does not give you any additional safety. Please do not buy the fairly tale that assert() saves you from the scary memory corruption issues, it does NOT.

Decoupling an API

This weekend on #libav-devel we discussed again a bit about the problems with the current core avcodec api.

Current situation

Decoding

We have 3 decoding functions for each of the supported kind of media types: Audio, Video and Subtitles.

Subtitles are already a sore thumb since they are not using AVFrame but a specialized structure, let’s ignore it for now. Audio and Video share pretty much the same signature:

int avcodec_decode_something(AVCodecContext *avctx, AVFrame *f, int *got_frame, AVPacket *p)

It takes a context pointer containing the decoder state, consumes a demuxed packet and optionally outputs a decoded frame containing raw data in a certain format (audio samples, a video frame).

The usage model is quite simple it takes packets and whenever it has enough encoded data to emit a frame it emits one, the got_frame pointer signals if a frame is ready or more data is needed.

Problem:

What if 1 AVPacket is near always enough to output 2 or more frames of raw data?

This happens with MVC and other real-world scenarios.

In general our current API cannot cope with it cleanly.

While working with the MediaSDK interface from Intel and now with MMAL for the Rasberry Pi, similar problems arisen due the natural parallelism the underlying hardware has.

Encoding

We have again 3 functions again Subtitles are somehow different, while Audio and Video are sort of nicely uniform.

int avcodec_encode_something(AVCodecContext *avctx, AVPacket *p, const AVFrame *f, int *got_packet)

It is pretty much the dual of the decoding function: the context pointer is the same, a frame of raw data enters and a packet of encoded data. Again we have a pointer to signal if we had enough data and an encoded packet had been outputted.

Problem:

Again we might get multiple AVPacket produced out of a single AVFrame data fed.

This happens when the HEVC “workaround” to encode interlaced content makes the encoder to output the two separate fields as separate encoded frames.

Again, the API cannot cope with it cleanly and threaded or otherwise parallel encoding fit the model just barely.

Decoupling the process

To fix this issue (and make our users life simpler) the idea is to split the feeding data function from the one actually providing the processed data.

int avcodec_decode_push(AVCodecContext *avctx, AVPacket *packet);
int avcodec_decode_pull(AVCodecContext *avctx, AVFrame *frame);
int avcodec_decode_need_data(AVCodecContext *avctx);
int avcodec_decode_have_data(AVCodecContext *avctx);

int avcodec_encode_push(AVCodecContext *avctx, AVFrame *frame);
int avcodec_encode_pull(AVCodecContext *avctx, AVPacket *packet);
int avcodec_encode_need_data(AVCodecContext *avctx);
int avcodec_encode_have_data(AVCodecContext *avctx);

From a single function 4 are provided, why it is simple?

The current workflow is more or less like

while (get_packet_from_demuxer(&pkt)) {
    ret = avcodec_decode_something(avctx, frame, &got_frame, pkt);
    if (got_frame) {
        render_frame(frame);
    }
    if (ret < 0) {
        manage_error(ret);
    }
}

The get_packet_from_demuxer() is a function that dequeues from some queue the encoded data or directly call the demuxer (beware: having your I/O-intensive demuxer function blocking your CPU-intensive decoding function isn’t nice), render_frame() is as well either something directly talking to some kind of I/O-subsystem or enqueuing the data to have the actual rendering (including format conversion, overlaying and scaling) in another thread.

The new API makes much easier to keep the multiple area of concern separated, so they won’t trip each other while the casual user would have something like

while (ret >= 0) {
    while ((ret = avcodec_decode_need_data(avctx)) > 0) {
        ret = get_packet_from_demuxer(&pkt);
        if (ret < 0)
           ...
        ret = avcodec_decode_push(avctx, &pkt);
        if (ret < 0)
           ...
    }
    while ((ret = avcodec_decode_have_data(avctx)) > 0) {
        ret = avcodec_decode_pull(avctx, frame);
        if (ret < 0)
           ...
        render_frame(frame);
    }
}

That has probably few more lines.

Asyncronous API

Since the decoupled API is that simple, is possible to craft something more immediate for the casual user.

typedef struct AVCodecDecodeCallback {
    int (*pull_packet)(void *priv, AVPacket *pkt);
    int (*push_frame)(void *priv, AVFrame *frame);
    void *priv_data;
} AVCodecDecodeCallback;

int avcodec_register_decode_callbacks(AVCodecContext *avctx, AVCodecDecodeCallback *cb);

int avcodec_decode_loop(AVCodecContext *avctx)
{
    AVCodecDecodeCallback *cb = avctx->cb;
    int ret;
    while ((ret = avcodec_decode_need_data(avctx)) > 0) {
        ret = cb->pull_packet(cb->priv_data, &pkt);
        if (ret < 0)
            return ret;
        ret = avcodec_decode_push(avctx, &pkt);
        if (ret < 0)
            return ret;
    }
    while ((ret = avcodec_decode_have_data(avctx)) > 0) {
        ret = avcodec_decode_pull(avctx, frame);
        if (ret < 0)
            return ret;
        ret = cb->push_frame(cb->priv_data, frame);
    }
    return ret;
}

So the actual minimum decoding loop can be just 2 calls:

ret = avcodec_register_decode_callbacks(avctx, cb);
if (ret < 0)
   ...
while ((ret = avcodec_decode_loop(avctx)) >= 0);

Cute, isn’t it?

Theory is simple …

… the practice not so much:
– there are plenty of implementation issues to take in account.
– LOTS of tedious work converting all the codecs to the new API.
– lots of details to iron out (e.g. have_data() and need_data() should block or not?)

We did radical overhauls before, such as introducing reference-counted AVFrames thanks to Anton, so we aren’t much scared of reshaping and cleaning the codebase once more.

If you like the ideas posted above or you want to discuss them more, you can join the Libav irc channel or mailing list to discuss and help.

Demotivation, FUD and why I still contribute to Libav

Libav had been since its start a controversial project, mainly because lots of drama and a huge amount of manure had been thrown against it and even about 4 years after its start there are people spewing this kind of vitriolic comments.

What is Libav

How it started

Libav started when the trademark FFmpeg had been given by the owner of it Fabrice Bellard to the former FFmpeg leader Michael Niedermayer.

Michael Niedermayer managed to get demoted from his leader position by the topmost 18 people involved in FFmpeg by the time due his tendency of not following even the basic project rules. That after weeks from being voted to stay as leader by 15, 5 explicitly stating their vote was conditioned by his behavior and 1 definitely against him.

His demotion is due to acts in full disregard of the policies in place, even those enforced automatically by the svn hooks.

The fact he bullied and belittled volunteers and contributor can be checked by digging the mailing list during the months and the year before the management change and it also added up to the decision.

What aims to do

Being burnt by the unreliable leadership experience the Libav organization focused on rules both for development and for management.

Ideally having a clear set of rules and making every member and contributor abide to the very same rule would prevent abuses.

Rules in summary

All the code must use the same coding style
All the patches require a review.
- Nothing hits the tree before a second pair of eyes read the code.
- No part of the code is a private garden that only selected people modify on their whim.
Everybody must abide to a quite simple code of conduct.
- rude behavior is not welcome.
- flames are not welcome.
- constructive criticism is needed.

Since FFmpeg was a project famous for horrid code quality and sketchy and irregular API and a good chunk of needed changes were vetoed by the now-demoted leader, Libav focused mainly in cleaning up the code and making the API easy to use. This lead to deep overhauls such as reference-counted frames to make the multi-threading much simpler, reference-counted packets to make extended workflow easy and a good chunk of bugs and ancient issues fixed.

Demotivation and FUD

The people involved in Libav mainly focused on code, ignoring most if not all the kind of wild statements fans of Michael Niedermayer dispersed around internet. The idea is that the code should speak by itself.

Did not work as expected. At all.

Apparently news outlets prefer to repeat whatever they find on a blogpost instead of checking the facts by reading the mailing list and the git. Many misconceptions, to use an euphemism, had been spun around and apparently won’t die that quickly if left unaccounted.

The thief theme

“The people behind Libav stole the FFmpeg infrastructure”, some of the admins even got questioned in their workplace if they really stole something. Pity that most of the people related in keeping the infrastructure functional were doing so on their own hardware and co-location, but obviously checking facts is hard, better go help shoveling manure on people.

Needless to say that gets hard keep spending time and resources and get this kind of treatment. At VDD 2014 there were some agreements on at least clear the air about this by having a strong statement about it. So far it did not happen.

The daily merge

“Libav is a non-functional fork of FFmpeg” is quite often repeated all around, Michael started to merge daily everything Libav does more or less since the beginning, making FFmpeg effectively a strict derivative of Libav.

Initially he even use a quite misleading moniker in his merge naming qatar the Libav tree. Now things are a little more fair and at least FFmpeg more or less states that is a derivative of Libav with additional features in their download page.

Few people ended up deciding to leave the Libav project since they did not appreciate the fact their hard work done on their spare time would be used by somebody else to disparage them and on top of it to ask for donation with wild claims that the “80% of all is done in FFmpeg is done by Michael”.

You are helping the evil

New people contributing to Libav quite often get some fans contacting them to enlighten the people about how evil is Libav and the people working on it and how much better FFmpeg is and how better would be to contribute to the real project.

This is a form of harassment, so far the people involved kept sort of quiet, but probably would help being more outspoken since most people around do not like to check the facts. Private emails probably should stay private, but blog post comments can be easily found and linked. Since from the beginning such “jokes” had been spun I let the reader imagine how much patience somebody should muster to keep staying quiet and just write code (this post is me getting fed up enough after 3+ years).

Motivation and Demotivation

I like to write code that is functional and doesn’t pokes you eyes when you read it after a while and I like to write code about multimedia among the other things.

I used to consider the split and the following competition had been a bliss, since the original FFmpeg project was pretty much dying due the kind of environment it had. Having a competitor tempers the bad tendencies quite well.

On the other hand I’m not cool at all in having an unfair competition, with a side piggy-backing on the other like it is happening: everything in Libav is merged inf FFmpeg, enjoying the fact the code is polished and cleaned before. Libav has the above mentioned rules so any patch has to be discussed and usually that leads to a fair amount of changes when what is cherry-picked from FFmpeg hits the mailing list for review a good deal of times is faster and simpler to just write fix covering more issues or rewrite from scratch the feature some user deemed interesting to have.

I probably won’t stop contributing to Libav since I like a lot working with the other people involved and the few people thanking me now and then make me think that giving up would just cause more harm than good in the big picture. I can understand those that decide to stop though, sometimes the amount of nonsense thrown at you by those rabid fans not knowing anything is appalling.

Hopefully writing more about it might help defusing this situation.

PS: You might also read the often-ignored initial remark from Kostya

Hardware acceleration in Libav

Multimedia formats require lots of computation power since they use fairly complex mathematical transformation. Usually most of them can be implemented efficiently in silicon requiring orders of magnitude less power to run while remaining quite fast to execute.

Hardware acceleration

Most of the current platforms, being them desktop, mobile or server, sports some kind of hardware unit to offload decoding and encoding of multimedia formats.

They usually are accessed through platform-specific API, sometimes the API is even codec-specific, making the whole implementation experience quite painful and a lot time consuming.

Depending on the specific hwaccel implementation, it could be bound to the gpu and use the gpu memory, thus requiring to manage non-system memory in specific ways adding additional burden for those that would just to have some quick gain while opening a world of interesting optimization possibilities such as zerocopy processing pipelines for transcoding or in-gpu pixel-format conversion, scaling and blending.

There are some generic wrappers such as vdpau, vaapi, dxva2 and vt that abstract some of the complexity related providing a more uniform interface, but usually there is a need of a proper (and possibly near transparent) fallback for the situations in which the hardware cannot really manage an advanced codec profile so just leveraging the generic abstractions solve just part of the problem but, as decoding goes, it provides a large performance boost while requiring some effort in managing the non-system memory.

For most of the users the learning curve is too steep for being really useful.

Libav and hwaccel

Hardware acceleration support happened to be implemented more or less around a number of specific implementation (with quite non-uniform approach, so vaapi had some hooks in the codecs, vdpau had full decoders until it had been ported to the same interface and made way nicer to use thanks to Rémi) and requiring quite a number of backend specific boilerplate code to set up the implementation specific context and then manage the opaque buffers the decoder outputs.

High level functionality

The hwaccel infrastructure is currently focused on the following items:
– the fallback from hardware to software should be as seamless as possible
– basic hardware decoders must be taken in account (e.g. for h264 some accept single NAL units and can’t parse the bitstream on their own)
– the user must have a mean to control the context setup and the full memory management

In order to do that the normal software decoder is used to parse the bitstream and depending on if the hwaccel is enabled or not, route the parsed data to the software or the hardware decoder and the output frames are then managed by the decoder frame reordering functionality if present.

This way falling back, even from a specific hwaccel to another, is sort of simple at least conceptually: every time a new extradata appears it is parsed, fed to the first hwaccel setup code if not supported optional fallback hwaccel can try and eventually the software decoder is picked.

The decoded frames, no matter if opaque hw-specific gpu-memory or normal system-memory go to the same codepath and the user has a mean to set the video rendering pipeline to take this in account.

Limit of the infrastructure

All the software evolves, since the minimalistic approach to the hardware acceleration requires a huge amount of boilerplate code and deep knowledge of the bitstream formats in use the new APIs for the new hardware tried to improve and be easier to manage for less savvy users.

API such mfx try to abstract completely and just require to get the input bitstream so the hardware input buffers can be filled and produce the frames, in presentation order, once the (parallel) decoding process yelds. It is in pull mode instead of push, so when more data is needed it gets requested and when a frame is ready it gets notified.

For an user most of the headaches related to frame management and elementary stream parsing are gone, or almost gone since some formats have multiple elementary stream representation and only a single one is supported…

For me (and then Anton), having an API like that poses multiple problems.
– having to feed a bitstream requires constructing it back from the software parsing and this is not terrible, it had been done for vda.
– the decoder wants to get the data only when the hardware buffers require more and that would require at least a queue.
– the frames outputted are already in presentation order, requiring to bypass the frame reordering logic.

Fitting a new-style hardware acceleration API in Libav

I tried the following approaches:
– Consider libmfx a normal third party decoder
– Anton complained about removing completely the ability to user hardware memory
– Implement additional hooks to keep the hwaccel interface but avoid the bitstream parsing and frame reordering.
– Anton and others complained about preventing the transparent fallback

Then I let Anton try for himself and in the end we agreed that providing an interim solution that let the user manage the memory and then trying to complete hwaccel2 on a second time would be the best.

Document your project!

After discussing how to track your bugs and your contributions, let see what we have about documentation

Pain and documentation

An healthy Open Source project needs mainly contributors, contributors are usually your users. You get users if the project is known and useful. (and you do not have parasitic entities syphoning your work abusing git-merge, best luck to io.js and markdown-it to not have this experience, switching name is enough of a pain without it).

In order to gain mindshare, the best thing is making what you do easier to use and that requires documenting what you did! The process is usually boring, time consuming and every time you change something you have to make sure the documentation still matches reality.

In the opensource community we have multiple options on the kind of documentation we produce and how to produce.

Wiki

When you need to keep some structure, but you want to have an easy way to edit it wiki can be a good choice and it can lead to nice results. The information present is usually correct and if enough people keep editing it up to date.

Pros:

The wiki is quick to edit and you can have people contribute by just using a browser.
The documentation is indexed by the search engines easily
It can be restricted to a number of trusted people

Cons

The information is detached from the actual code and it could desync easily
Even if kept up to date, what applies to the current release is not what your poor user might have
Usually keeping versioned content is not that simple

Forum

Even if usually they are noisy forums are a good source of information plenty of time.
Personally I try to move interesting bits to a wiki page when I found something that is not completely transient.

Pros:

Usually everything require less developer interaction
User can share solutions to their problem effectively

Cons

The information can get stale even quicker that what you have in the wiki
Since it is mainly user-generate the solutions proposed might be suboptimal
Being highly interactive it requires more dedicated people to take care of unruly users

Manuals

There are lots of good toolchain to write full manuals as we have in Gentoo.

The old style xml/docbook tend to have a really steep learning curve, not to mention the even more quirky and wonderful monster such as LaTeX (and the lesser texinfo). ReStructuredText, asciidoc and some flavour of markdown seem to be a better tool for the task if you need speed and get contributors up to speed.

Pros:

A proper manual can be easily pinned to a specific release
It can be versioned using git
Some people still like something they can print and has a proper index

Cons

With the old tools it is a pain to start it
The learning curve can be still unbearable for most contributors
It requires some additional dedication to keep it up to date

What to use and why

Usually for small projects the manual is the README, once it grows usually a wiki is the best place to put notes from multiple people. If you are good at it a manual is a boon for all your users.

Tools to have documentation-in-code such as doxygen or docurium can help a lot if your project is having a single codebase.

If you need to unify a LOT of different information, like we have in Gentoo. The problems usually get much more annoying since you have contents written in multiple markups, living in multiple places and usually moving it from one place to another requires a serious editing effort (like moving from our guidexml to the current semantic wiki).

Markup suggestion

Markdown/CommonMark/Kramdown

I do like a lot CommonMark and I even started to port and extend it to be used in docutils since I find ReStructuredText too confusing for the normal users. The best quality of it is the natural flow, it is most annoying defect is that there are too many parser discrepancies and sometimes implementations disagrees. Still is better to have many good implementation than one subpar in everything (hi texinfo, I hate your toolchain).

Asciidoc

The markup is quite nice (up to a point) and the toolchain is sort of nice even if it feels like a Rube Goldberg machine. To my knowledge there is a single implementation of it and that makes me MUCH wary of using it in new projects.

ReStructuredText

The markup is not as intuitive as Asciidoc, thus quite far from Markdown immediate-use feeling, but it has great toolchain (if you like python) and it gets extended to produce lots of different well formatted documents.
It comes with loads markup features that Markdown core lacks: include directive, table of contents, pluggable generic block and span directives, 3 different flavours of tables.

Good if you can come to terms with its complexity all in all.

What’s next

Hopefully during this year among my many smaller and bigger projects, I’ll find time to put together something nice for documentation as well.

Pre-made Builds

Had been years I’m maintaining the win32 builds, about 2 weeks ago I had the n-th failure with the box hosting it and since I was busy with some work stuff I could not fix it till this week.

Top-IX graciously provided a better sized system and I’m almost done reconfiguring it. Sadly setting up the host and reconfigure it is a quite time consuming task and not so many (if all) show appreciation for it. (please do cheer for the other people taking care of our other piece of infrastructure from time to time).

The new host is builds.libav.org since it will host builds that are slightly more annoying to get. Probably it will start with just builds for the releases and then if there is interests (and volunteers) it will be extended to nightly builds.

Changes

More Platforms

The first an more apparent is that we’ll try to cover more platforms, soon I’ll start baking some Android builds and then hopefully Apple-oriented stuff will appear in some form.

Building Libav in itself is quite simple and hopefully documented well enough and our build system is quite easy to use for cross building.

Getting some of the external dependencies built, on the other hand, is quite daunting. gnutls/nettle and x265 are currently missing since their build system is terrible for cross compiling and my spare time didn’t allow to get that done within the deadline I set for myself.

Possibly in few weeks we will get at least the frameworks packaging for iOS and Android. Volunteers to help are more than welcome.

New theme

The new theme is due switching to nginx so now thanks to fancy_index is arguably nicer.

More builds

The original builds tried to add almost everything that was deemed useful and thus the whole thing was distributed under gpl. Since I noticed some people might not really need that or might just want less functionality I added a lgpl-distributable set. If somebody feels useful having a version w/out any dependencies, please drop me a line.

Thanks

Thanks again to Top-IX for the support and Gabriele in particular for setting up the new system while he was in a conference in London.

Thanks for Sean and Reinhart for helping with the continuous integration system.

Enjoy the new builds!

Post Scriptum: token of appreciation in form of drinks or just thank you are welcome: writing code is fun, doing sysadmin tasks is not.

Making a new demuxer

Maxim asked me to to check a stream from a security camera that he could not decode with avconv without forcing the format to mjpeg.

Mysterious stream

Since it is served as http the first step had been checking the mime type. Time to use curl -I.

# curl -I "http://host/some.cgi?user=admin&amp;pwd=pwd" | grep Content-Type

Interesting enough it is a multipart/x-mixed-replace

Content-Type: multipart/x-mixed-replace;boundary=object-ipcamera

Basically the cgi sends a jpeg images one after the other, we even have a (old and ugly) muxer for it!

Time to write a demuxer.

Libav demuxers

We already have some documentation on how to write a demuxer, but it is not complete so this blogpost will provide an example.

Basics

Libav code is quite object oriented: every component is a C structure containing a description of it and pointers to a set of functions and there are fixed pattern to make easier to make new code fit in.

Every major library has an all${components}.c in which the components are registered to be used. In our case we talk about libavformat so we have allformats.c.

The components are built according to CONFIG_${name}_${component} variables generated by configure. The actual code reside in the ${component} directory with a pattern such as ${name}.c or ${name}dec.c/${name}enc.c if both demuxer and muxer are available.

The code can be split in multiple files if it starts growing to an excess of 500-1000 LOCs.

Registration

We have some REGISTER_ macros that abstract some logic to make every component selectable at configure time since in Libav you can enable/disable every muxer, demuxer, codec, IO/protocol from configure.

We had already have a muxer for the format.

    REGISTER_MUXER   (MPJPEG,           mpjpeg);

Now we register both in a single line:

    REGISTER_MUXDEMUX(MPJPEG,           mpjpeg);

The all${components} files are parsed by configure to generate the appropriate Makefile and C definitions. The next run we’ll get a new
CONFIG_MPJPEG_DEMUXER variable in config.mak and config.h.

Now we can add to libavformat/Makefile a line like

OBJS-$(CONFIG_MPJPEG_DEMUXER)            += mpjpegdec.o

and put our mpjpegdec.c in libavformat and we are ready to write some code!

Demuxer structure

Usually I start putting down a skeleton file with the bare minimum:

The AVInputFormat and the core _read_probe, _read_header and _read_packet callbacks.

#include "avformat.h"

static int ${name}_read_probe(AVProbeData *p)
{
    return 0;
}

static int ${name}_read_header(AVFormatContext *s)
{
    return AVERROR(ENOSYS);
}

static int ${name}_read_packet(AVFormatContext *s, AVPacket *pkt)
{
    return AVERROR(ENOSYS);
}

AVInputFormat ff_${name}_demuxer = {
    .name           = "${name}",
    .long_name      = NULL_IF_CONFIG_SMALL("Longer ${name} description"),
    .read_probe     = ${name}_read_probe,
    .read_header    = ${name}_read_header,
    .read_packet    = ${name}_read_packet,

I make so that all the functions return a no-op value.

_read_probe

This function will be called by the av_probe_input functions, it receives some probe information in the form of a buffer. The function return a score between 0 and 100; AVPROBE_SCORE_MAX, AVPROBE_SCORE_MIME and AVPROBE_SCORE_EXTENSION are provided to make more evident what is the expected confidence. 0 means that we are sure that the probed stream is not parsable by this demuxer.

_read_header

This function will be called by avformat_open_input. It reads the initial format information (e.g. number and kind of streams) when available, in this function the initial set of streams should be mapped with avformat_new_stream. Must return 0 on success. The skeleton is made to return ENOSYS so it can be run and just exit cleanly.

_read_packet

This function will be called by av_read_frame. It should return an AVPacket containing demuxed data as contained in the bytestream. It will be parsed and collated (or splitted) to a frame-worth amount of data by the optional parsers. Must return 0 on success. The skeleton again returns ENOSYS.

Implementation

Now let’s implement the mpjpeg support! The format in itself is quite simple:
– a boundary line starting with --
– a Content-Type line stating image/jpeg.
– a Content-Length line with the actual buffer length.
– the jpeg data

Probe function

We just want to check if the Content-Type is what we expect basically, so we just
go over the lines (\n\r-separated) and check if there is a tag Content-Type with a value image/jpeg.

static int get_line(AVIOContext *pb, char *line, int line_size)
{
    int i, ch;
    char *q = line;

    for (i = 0; !pb->eof_reached; i++) {
        ch = avio_r8(pb);
        if (ch == 'n') {
            if (q > line && q[-1] == 'r')
                q--;
            *q = '';

            return 0;
        } else {
            if ((q - line) < line_size - 1)
                *q++ = ch;
        }
    }

    if (pb->error)
        return pb->error;
    return AVERROR_EOF;
}

static int split_tag_value(char **tag, char **value, char *line)
{
    char *p = line;

    while (*p != '' && *p != ':')
        p++;
    if (*p != ':')
        return AVERROR_INVALIDDATA;

    *p   = '';
    *tag = line;

    p++;

    while (av_isspace(*p))
        p++;

    *value = p;

    return 0;
}

static int check_content_type(char *line)
{
    char *tag, *value;
    int ret = split_tag_value(&tag, &value, line);

    if (ret < 0)
        return ret;

    if (av_strcasecmp(tag, "Content-type") ||
        av_strcasecmp(value, "image/jpeg"))
        return AVERROR_INVALIDDATA;

    return 0;
}

static int mpjpeg_read_probe(AVProbeData *p)
{
    AVIOContext *pb;
    char line[128] = { 0 };
    int ret;

    pb = avio_alloc_context(p->buf, p->buf_size, 0, NULL, NULL, NULL, NULL);
    if (!pb)
        return AVERROR(ENOMEM);

    while (!pb->eof_reached) {
        ret = get_line(pb, line, sizeof(line));
        if (ret < 0)
            break;

        ret = check_content_type(line);
        if (!ret)
            return AVPROBE_SCORE_MAX;
    }

    return 0;
}

Here we are using avio to be able to reuse get_line later.

Reading the header

The format is pretty much header-less, we just check for the boundary for now and
set up the minimum amount of information regarding the stream: media type, codec id and frame rate. The boundary by specification is less than 70 characters with -- as initial marker.

static int mpjpeg_read_header(AVFormatContext *s)
{
    MPJpegContext *mp = s->priv_data;
    AVStream *st;
    char boundary[70 + 2 + 1];
    int ret;

    ret = get_line(s->pb, boundary, sizeof(boundary));
    if (ret < 0)
        return ret;

    if (strncmp(boundary, "--", 2))
        return AVERROR_INVALIDDATA;

    st = avformat_new_stream(s, NULL);

    st->codec->codec_type = AVMEDIA_TYPE_VIDEO;
    st->codec->codec_id   = AV_CODEC_ID_MJPEG;

    avpriv_set_pts_info(st, 60, 1, 25);

    return 0;
}

Reading packets

Even this function is quite simple, please note that AVFormatContext provides an
AVIOContext. The bulk of the function boils down to reading the size of the frame,
allocate a packet using av_new_packet and write down if using avio_read.

static int parse_content_length(char *line)
{
    char *tag, *value;
    int ret = split_tag_value(&tag, &value, line);
    long int val;

    if (ret < 0)
        return ret;

    if (av_strcasecmp(tag, "Content-Length"))
        return AVERROR_INVALIDDATA;

    val = strtol(value, NULL, 10);
    if (val == LONG_MIN || val == LONG_MAX)
        return AVERROR(errno);
    if (val > INT_MAX)
        return AVERROR(ERANGE);
    return val;
}

static int mpjpeg_read_packet(AVFormatContext *s, AVPacket *pkt)
{
    char line[128];
    int ret, size;

    ret = get_line(s->pb, line, sizeof(line));
    if (ret < 0)
        return ret;

    ret = check_content_type(line);
    if (ret < 0)
        return ret;

    ret = get_line(s->pb, line, sizeof(line));
    if (ret < 0)
        return ret;

    size = parse_content_length(line);
    if (size < 0)
        return size;

    ret = get_line(s->pb, line, sizeof(line));
    if (ret < 0)
        goto fail;

    ret = av_new_packet(pkt, size);
    if (ret < 0)
        return ret;

    ret = avio_read(s->pb, pkt->data, size);
    if (ret < 0)
        goto fail;

    ret = get_line(s->pb, line, sizeof(line));
    if (ret < 0)
        goto fail;

    // Consume the boundary marker
    ret = get_line(s->pb, line, sizeof(line));
    if (ret < 0)
        goto fail;

    return ret;

fail:
    av_free_packet(pkt);
    return ret;
}

What next

For now I walked you through on the fundamentals, hopefully next week I’ll show you some additional features I’ll need to implement in this simple demuxer to make it land in Libav: AVOptions to make possible overriding the framerate and some additional code to be able to do without Content-Length and just use the boundary line.

PS: wordpress support for syntax highlight is quite subpar, if somebody has a blog engine that can use pygments or equivalent please tell me and I’d switch to it.

Fix ALL the BUGS!

Vittorio started (with some help from me) to fix all the issues pointed by Coverity.

Static analysis

Coverity (and scan-build) are quite useful to spot mistakes even if their false-positive ratio tend to be quite high. Even the false-positives are usually interesting since the spot code unnecessarily convoluted. The code should be as simple as possible but not simpler.

The basic idea behind those tools is to try to follow the code-paths while compiling them and spot what could go wrong (e.g. you are feeding a NULL to a function that would deference it).

The problems with this approach are usually two: false positive due to the limited scope of the analyzer and false negatives due shadowing.

False Positives

Coverity might assume certain inputs are valid even if they are made impossible by some initial checks up in the codeflow.

In those case you should spend enough time to make sure Coverity is not right and those faulty inputs aren’t slipping somewhere. NEVER try to just add some checks to the code pointed as first move, you might either hide issues (e.g. if Coverity complains about uninitialized variable do not just initialize it to nothing, check why it happens and if the logic behind is wrong).

If Coverity is confused, your compiler is confused as well and will produce suboptimal executables.
Properly fixing those issues can result in useful speedups. Simpler code is usually faster.

Ever increasing issue count

While fixing issues using those tools you might notice to your surprise that every time you fix something, something new appears out of thin air.

This is not magic but simply that the static analyzers usually keep some limit on how deep they go depending on the issues already present and how much time had been spent already.

That surprise had been fun since apparently some of the time limit is per compilation unit so splitting large files in smaller chunks gets us more results (while speeding up the building process thanks to better parallelism).

Usually fixing some high-impact issue gets us 3 or 5 new small impact issues.

I like solving puzzles so I do not mind having more fun, sadly I did not have much spare time to play this game lately.

Merge ALL the FIXES

Fixing properly all the issues is a lofty goal and as usual having a patch is just 1/2 of the work. Usually two set of eyes work better than one and an additional brain with different expertise can prevent a good chunk of mistakes. The review process is the other, sometimes neglected, half of solving issues.

So far about 100+ patches got piled up over the past weeks and now they are sent in small batches to ease the work of review. (I have something brewing to make reviewing simpler, as you might know)

During the review what probably about 1/10 of the patches will be rejected and the relative coverity report updated with enough information to explain why it is a false positive or the dangerous or strange behaviour pointed is intentional.

The next point release for our 4 maintained major releases: 0.8, 9, 10 and 11. Many thanks to the volunteers that spend their free time keeping all the branches up to date!

Tracking patches

You need good tools to do a good job.

Even the best tool in the hand of a novice is a club.

I’m quite fond in improving the tools I use. And that’s why I started getting involved in Gentoo, Libav, VLC and plenty of other projects.

I already discussed about lldb and asan/valgrind, now my current focus is about patch trackers. In part it is due to the current effort to improve the libav one,

Contributors

Before talking about patches and their tracking I’d digress a little on who produces them. The mythical Contributor: without contributions an opensource project would not exist.

You might have recurring contributions and unique/seldom contributions. Both are quite important.
In general you should make so seldom contributors become recurring contributors.

A recurring contributor can accept to spend some additional time to setup the environment to actually provide its contribution back to the community, a sporadic contributor could be easily put off if the effort required to send his patch is larger than writing the patch itself.

Th project maintainers should make so the life of contributors is as simple as possible.

Patches and Revision Control

Lately most opensource projects saw the light and started to use decentralized source revision control system and thanks to github and many other is the concept of issue pull requests is getting part of our culture and with it comes hopefully a wider acceptance to the fact that the code should be reviewed before it is merged.

Pull Request

In a decentralized development scenario new code is usually developed in topic branches, routinely rebased against the master until the set is ready and then the set of changes (called series or patchset) is reviewed and after some round of fixes eventually merged. Thanks to bitbucket now we have forking, spooning and knifing as part of the jargon.

The review (and merge) step, quite properly, is called knifing (or stabbing): you have to dice, slice and polish the code before merging it.

Reviewing code

During a review bugs are usually spotted as well way to improve are suggested. Patches might be split or merged together and the series reworked and improved a lot.

The process is usually time consuming, even more for an organization made of volunteer: writing code is fun, address issues spotted is not so much, review someone else code is much less even.

Sadly it is a necessary annoyance since otherwise the errors (and horrors) that would slip through would be much bigger and probably much more. If you do not care about code quality and what you are writing is not used by other people you can probably ignore that, if you feel somehow concerned that what you wrote might turn some people life in a sea of pain. (On the other hand some gratitude for such daunting effort is usually welcome).

Pull request management

The old fashioned way to issue a pull request is either poke somebody telling that your branch is ready for merge or just make a set of patches and mail them to whoever is in charge of integrating code to the main branch.

git provides a nifty tool to do that called git send-email and is quite common to send sets of patches (called usually series) to a mailing list. You get feedback by email and you can update the set using the --in-reply-to option and the message id.

Platforms such as github and similar are more web centric and require you to use the web interface to issue and review the request. No additional tools are required beside your git and a browser.

gerrit and reviewboard provide custom scripts to setup ephemeral branches in some staging area then the review process requires a browser again. Every commit gets some tool-specific metadata to ease tracking changes across series revisions. This approach the more setup intensive.

Pro and cons

Mailing list approach

Testing patches from the mailing list is quite simple thanks to git am. And if the reply-to field is used properly updates appear sorted in a good way.

This method is the simplest for the people used to have the email client always open and a console (if they are using a well configured emacs or vim they literally do not move away from the editor).

On the other hand, people using a webmail or using a basic email client might find the approach more cumbersome than a web based one.

If your only method to track contribution is just a mailing list, gets quite easy to forget which is the status of a set. Patches could be neglected and even who wrote them might forget for a long time.

Patchwork approach

Patchwork tracks which patches hit a mailing list and tries to figure out if they are eventually merged automatically.

It is quite basic: it provides an web interface to check the status and provides a mean to just update the patch status. The review must happen in the mailing list and there is no concept of series.

As basic as it is works as a reminder about pending patches but tends to get cluttered easily and keeping it clean requires some effort.

Github approach

The web interface makes much easier spot what is pending and what’s its status, people used to have everything in the browser (chrome and mozilla could be made to work as a decent IDE lately) might like it much better.

Reviewing small series or single patches is usually nicer but the current UIs do not scale for larger (5+) patchsets.

People not living in a browser find quite annoying switch context and it requires additional effort to contribute since you have to register to a website and the process of issuing a patch requires many additional steps while in the email approach just require to type git send-email -1.

Gerrit approach

The gerrit interfaces tend to be richer than the Github counterparts. That can be good or bad since they aren’t as immediate and tend to overwhelm new contributors.

You need to make an additional effort to setup your environment since you need some custom script.

The series are tracked with additional precision, but for all the practical usage is the same as github with the additional bourden for the contributor.

Introducing plaid

Plaid is my attempt to tackle the problem. It is currently unfinished and in dire need of more hands working on it.

It’s basic concept is to be non-intrusive as much as possible, retaining all the pros of the simple git+email workflow like patchwork does.

It provides already additional features such as the ability to manage series of patches and to track updates to it. It sports a view to get a break out of which series require a review and which are pending for a long time waiting for an update.

What’s pending is adding the ability to review it directly in the browser, send the review email for the web to the mailing list and a some more.

Probably I might complete it within the year or next spring, if you like Flask or python contributions are warmly welcome!

VDD14 Discussions: HWAccel2

I took part to the Videolan Dev Days 14 weeks ago, sadly I had been too busy so the posts about it will appear in scattered order and sort of delayed.

Hardware acceleration

In multimedia, video is basically crunching numbers and get pixels or crunching pixels and getting numbers. Most of the operation are quite time consuming on a general purpose CPU and orders of magnitude faster if done using DSP or hardware designed for that purpose.

Availability

Most of the commonly used system have video decoding and encoding capabilities either embedded in the GPU or in separated hardware. Leveraging it spares lots of cpu cycles and lots of battery if we are thinking about mobile.

Capabilities

The usually specialized hardware has the issue of being inflexible and that does clash with the fact most codec evolve quite quickly with additional profiles to extend its capabilities, support different color spaces, use additional encoding strategies and such. Software decoders and encoders are still needed and need badly.

Hardware acceleration support in Libav

HWAccel 1

The hardware acceleration support in Libav grew (like other eldritch-horror tentacular code we have lurking from our dark past) without much direction addressing short term problems and not really documenting how to use it.

As result all the people that dared to use it had to guess, usually used internal symbols that they wouldn’t have to use and all in all had to spend lots of time and
had enough grief when such internals changed.

Usage

Every backend required a quite large deal of boilerplate code to initialize the backend-specific context and to render the hardware surface wrapped in the AVFrame.

The Libav backend interface was quite vague in itself, requiring to override get_format and get_buffer in some ways.

Overall to get the whole thing working the library user was supposed to do about 75% of the work. Not really nice considering people uses libraries to abstract complexity and avoid repetition…

Backend support

As that support was written with just slice-based decoder in mind, it expects that all the backend would require the software decoder to parse the bitstream, prepare slices of the frame and feed the backend with them.

Sadly new backends appeared and they take directly either bitstream or full frames, the approach had been just to take the slice, add back the bitstream markers the backend library expects and be done with that.

Initial HWAccel 2 discussion

Last year since the number of backends I wanted to support were all bitstream-oriented and not fitting the mode at all I started thinking about it and the topic got discussed a bit during VDD 13. Some people that spent their dear time getting hwaccel1 working with their software were quite wary of radical changes so a path of incremental improvements got more or less put down.

HWAccel 1.2

default functions to allocate and free the backend context and make the struct to interface between Libav and the backend extensible without causing breakage.
avconv now can use some hwaccel, providing at least an example on how to use them and a mean to test without having to gut VLC or mpv to experiment.
document better the old-style hwaccels so at least some mistakes could be avoided (and some code that happen to work by sheer look won’t break once the faulty assuptions cease to exist)

The new VDA backend and the update VDPAU backend are examples of it.

HWAccel 1.3

extend the callback system to fit decently bitstream oriented backends.
provide an example of backend directly providing normal AVFrames.

The Intel QSV backend is used as a testbed for hwaccel 1.3.

The future of HWAccel2

Another year, another meeting. We sat down again to figure out how to get further closer to the end result of not having the casual users write boilerplate code to use hwaccel to get at least some performance boost and yet let the power users have the full access to the underpinnings so they can get most of it without having to write everything from scratch.

Simplified usage, hopefully really simple

The user just needs to use AVOption to set specific keys such as hwaccel and optionally hwaccel-device and the library will take care of everything. The frames returned by avcodec_decode_video2 will contain normal system memory and commonly used pixel formats. No further special code will be needed.

Advanced usage, now properly abstracted

All the default initialization, memory/surface allocation and such will remain overridable, with the difference that an additional callback called get_hw_surface will be introduced to separate completely the hwaccel path from the software path and specific functions to hand over the ownership of backend contexts and surfaces will be provided.

The software fallback won’t be anymore automagic in this case, but a specific AVERROR_INPUT_CHANGED will be returned so would be cleaner for the user reset the decoder without losing the display that maybe was sharing the same context. This leads the way to a simpler mean to support multiple hwaccel backends and fall back from one to the other to eventually the software decoding.

Migration path

We try our best to help people move to the new APIs.

Moving from HWAccel1 to HWAccel2 in general would result in less lines of code in the application, the people wanting to keep their callback need to just set them after avcodec_open2 and move the pixel specific get_buffer to get_hw_surface. The presence of av_hwaccel_hand_over_frame and av_hwaccel_hand_over_context will make much simpler managing the backend specific resources.

Expected Time of Arrival

Right now the review is on the HWaccel1.3, I hope to complete this step and add few new backends to test how good/bad that API is before adding the other steps. Probably HWAccel2 will take at least other 6 months.

Help in form of code or just moral support is always welcome!