broken-endian

You wrote your code, you wrote the tests and everything seems working.

Then you got somebody running your code on a big-endian machine and reports that EVERYTHING is broken.

Usually most of the data is serialized to disk or wire as big-endian, most of cpu usually do the computation in little-endian (with MIPS and PowerPC as rare exception). If you assume the relationship between the data on-wire and data in the cpu registers is always the same you are bound to have problems (and it gets even worse if you decide to write the data down as little-endian to disk because swapping from cpu to disk feels slow, you are doing it wrong).

Checklist

The problem is mainly while reading or writing:

  • Sometimes feels simpler to copy over some packed structure using the equivalent of read(fd, &my_struct, sizeof(struct)). if the struct contains anything different from byte-sized variables it won’t work, so is safe to say it won’t work at all. Gets even worse if you forgot to mark the structure as packed.
  • Writing has the same issue, never try to directly write a structure or even 16bit integers w/out making sure you get the expected endianess right.

Mini-post written to recall what not to do (more examples later).

Rethinking AVFormat – part 1

Container formats should be just a boring application of serialization of multiple arrays of tuples timestamp-binary blob.

Instead there are tons of implementation details and there are fun
and exceedingly annoying means to lose your sanity.

This post is yet another post about APIs you can see other here and here.

Current Status

In Libav we have libavformat taking care of general I/O, Muxing, Demuxing.

This blog post will not cover the additional grouping given by Programs, Chapters and such to not make the whole article huge and just focus on the basics.

I/O

The AVIO abstraction provides a mean to uniformly access content stored in files, available as remote streams (e.g. served through http or rtmp) or through custom implementations.

This part of the API is rightly coupled with the Muxer and Demuxer implementation.

It uses the common Context pattern you can find across the rest of Libav with some of twists:

  • The protocol handler can be guessed using the url provided, e.g. file:///tmp/foo.
  • The functions that allocate a context take an extra parameter than the usual options AVDictionary in the form of a callback function.
  • You can create your own custom protocol easily.
int avio_open2(AVIOContext **s, const char *url, int flags, const AVIOInterruptCB *int_cb, AVDictionary **options)

AVIOContext *avio_alloc_context(unsigned char *buffer, int buffer_size, int write_flag, void *opaque,
                                int(*read_packet)(void *opaque, uint8_t *buf, int buf_size),
                                int(*write_packet)(void *opaque, uint8_t *buf, int buf_size),
                                int64_t(*seek)(void *opaque, int64_t offset, int whence))
int avio_closep(AVIOContext **s);

The api tries to mimic the C stdio plus lots of API sugar.

# core functions
int avio_read(AVIOContext *s, unsigned char *buf, int size);
void avio_write(AVIOContext *s, const unsigned char *buf, int size);
int64_t avio_seek(AVIOContext *s, int64_t offset, int whence);


# simple integer readers
int          avio_r8  (AVIOContext *s);
uint64_t     avio_rb64(AVIOContext *s);
uint64_t     avio_rl64(AVIOContext *s);
unsigned int avio_rb16(AVIOContext *s);
unsigned int avio_rb24(AVIOContext *s);
unsigned int avio_rb32(AVIOContext *s);
unsigned int avio_rl16(AVIOContext *s);
unsigned int avio_rl24(AVIOContext *s);
unsigned int avio_rl32(AVIOContext *s);

# simple integer writers
void avio_w8(AVIOContext *s, int b);
void avio_wb16(AVIOContext *s, unsigned int val);
void avio_wb24(AVIOContext *s, unsigned int val);
void avio_wb32(AVIOContext *s, unsigned int val);
void avio_wb64(AVIOContext *s, uint64_t val);
void avio_wl16(AVIOContext *s, unsigned int val);
void avio_wl24(AVIOContext *s, unsigned int val);
void avio_wl32(AVIOContext *s, unsigned int val);
void avio_wl64(AVIOContext *s, uint64_t val);


# utf8 and utf16 strings
int avio_get_str(AVIOContext *pb, int maxlen, char *buf, int buflen);

int avio_get_str16le(AVIOContext *pb, int maxlen, char *buf, int buflen);
int avio_get_str16be(AVIOContext *pb, int maxlen, char *buf, int buflen);

int avio_put_str(AVIOContext *s, const char *str);

int avio_put_str16le(AVIOContext *s, const char *str);

... (and more) ...

Buffering

All the function use an intermediate buffer to back reads and writes, the buffer can be explicitly flushed or it gets flushed automatically once the request would end outside it.

void avio_flush(AVIOContext *s);

A special kind of AVIOContext is a dynamic write buffer, it extends on demand and can be used to build complex recourring patterns once and write them as many time as needed.

int avio_open_dyn_buf(AVIOContext **s);

int avio_close_dyn_buf(AVIOContext *s, uint8_t **pbuffer);

Error handling

An I/O layer has to take in account the fact the resource being read or written could be abruptly disappear or suddenly slow down. This is valid for both local and remote resources.

The internal buffer allocation might fail.

A seek too far could lead to the end of file.

AVIO approach to errors is quite simplicistic:
– A write can silently fail.
– A failing read just returns 0-ed buffer or value.
– All the functions set the error field or the eof_reached field.

Is up to the user to decide when to check for I/O problems or leverage the AVIOInterruptCB to implement timeouts or other mean to interrupt a read or a write that otherwise would just quietly block till it is completed.

Demuxing (and Probing)

The AVFormat part taking care of input streams can be split in three: Probing the data to guess the right demuxer, the actual Demuxing and optionally parse the demuxed data and fit it in packets containing the information needed by the decoder to decode a frame of video or a matching amount of audio samples, later I call it frame-worth amount of data and I call this process chopping amorphous data streams. It is colorful as expression but represents quite well the endeavor.

Probing

The Probe functions take an arbitrary big chunk of data (stored in a AVProbeData struct) and figure out which demuxer should be able to actually parse it correctly.

As a rule of thumb probes need to be fast since all of them have to be run over the data at least once and possibly multiple times since if the result is not really conclusive increasing the data and trying again is an option.

AVInputFormat *av_probe_input_format2(AVProbeData *pd, int is_opened,
                                      int *score_max);

An helper function to probe from an AVIOContext and get the possible input format is provided.

int av_probe_input_buffer(AVIOContext *pb, AVInputFormat **fmt,
                          const char *filename, void *logctx,
                          unsigned int offset, unsigned int max_probe_size);

It used internally by avformat_open_input to automatically figure out the demuxer to use and it might look a little confusing.

Demuxing

Once that the input format is either guessed or selected the actual muxing conceptually is just providing AVPackets
as they are parsed. You might want to reposition within the stream at random times (the infamous seeking opening yet another can of worms).

int avformat_open_input(AVFormatContext **ps, const char *filename,
                        AVInputFormat *fmt, AVDictionary **options);

int av_read_frame(AVFormatContext *s, AVPacket *pkt);

void avformat_close_input(AVFormatContext **ps);
Figuring out the data inside the format

Some container formats keep the information regarding their contents in a global header at the start of the file, other, that could have new data streams appearing at random times, do not.

Since there is no easy mean to figure out which kind of data they are storing, the only safe way to figure out is to try to decode some packets in order to know which kind of data is available, avformat_find_stream_info.

int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options);

The apparently simple function does a lot of work behind the scenes: it demuxes and decodes a settable number of packets before giving up and keeps all of them in an internal queue so that they will be available for demuxing even if the input stream is not seekable.

Getting the data outside

Containers such as MPEG PS mux data in small fixed-sized chunks
while usually muxers and decoders expect to receive AVPackets containing enough data to produce a frame.

Specific parsers can be inserted automatically to take amorphous stream of demuxed data and chop out of it AVPackets containing the expected amount of data.

This happens usually automatically so the user does not have to care about it as long as the codec parser is present.

Timestamps

The multimedia data is expected to carry a timestamp to present at the same time video frames and audio frames (and subtitles).

Some containers do provide directly such timestamps, other do not, requiring some amount of guesswork by some heuristics that might or might not work depending on the codec at hand.

For example, if the container is supposed to not allow variable frame rate, the implicit time stamp for video can be deduced from the frame number. This might not work as expected if the codec uses B-frames and requires some form
of reordering.

This part in Libav is sort of hidden and often causing a number of problems.

Seeking

Seeking is quite a different and large can of worms.

Ideally seeking just sets the AVIOContext to a certain position and the demuxer keeps working from there.

int av_seek_frame(AVFormatContext *s, int stream_index,
                  int64_t timestamp, int flags);

Depending on the container format and the codec picking the correct byte offset from the user provided timestamp can be incredibly simple or really complex, with various degrees of precision.

Some format provide an precise index so a plain lookup is enough, a dichotomic search looking for the closest I-frame is the common case and in the worst situation a linear search might be required.

In some cases auxiliary indexes are built to speed up seeking within previously parsed areas.

Seeking is not fun at the demuxer level and gets even worse at the codec level if the data provided is not the one expected.

Muxing

Muxing is sort of simpler than demuxing. The output format is always known and the data always come in AVPackets matching a frame-worth of raw data and possibly sporting correct timestamps.

API-wise it expects an AVFormatContext with the oformat set to the correct AVOutputFormat and the pb
set with an allocated AVIOContext and populated AVStreams.

Once the AVFormatContext is configured is possible to write the packets. First the global header should be written, then as many packets as needed are muxed, interleaving audio and video so that demuxing and seeking work correctly.

int avformat_write_header(AVFormatContext *s, AVDictionary **options);

int av_interleaved_write_frame(AVFormatContext *s, AVPacket *pkt);

int av_write_trailer(AVFormatContext *s);

Bitstream filtering

Some codecs have multiple possible representation, e.g. H264 has the AVCC bitstream format and the Annex B bitstream format. Come containers support both, other expect only one or the other. Currently the correct converter from a bitstream to another must be inserted manually.

Packet interleaving

Certain container formats have quite peculiar muxing rules. This is normally hidden from the user, in certain cases being able to override it is a boon.

Shortcomings summary

In the next post I will explain how I would improve the situation, today post is mainly to introduce the structure of AVFormat and start explaining what should be fixed. Here a short list of what I’d like to fix sooner than later.

Non-uniform API

  • There is quite a mixture of av_ and avformat_ namespaces.
  • The muxing and demuxing APIs are sufficiently confusing (and surely I should complete my avformat_open_output to reduce the boilerplate)

Abstractions Leaking the wrong way

  • The demuxing side automagically inserts parsers to chop data streams in a frame-worth amount of data while the muxing side would just fail if the bitstream provided is not matching the one required by the container format.
  • There is quite of hidden magic happening in avformat_find_stream_info and just recently we added options to at least flush the buffer it keeps to probe for codecs. Having a better function and a better mean to control this kind of internal buffer would be surely appreciated by the user that need to keep the latency low.
  • There is no good mean to be notified if the number of streams change (new streams found or old streams disappearing).

Bad implementations

  • The old muxers sometimes do not even use the now-available internals (e.g. the interleaver helpers) but implement internally queues and logic that should be now common and shared across all the muxers.
  • While AVCodec has (now) quite an uniform mean to slice bytes and bits, avformat is not leveraging it beside few places.

PS: Kostya prefers to provide both amorphous stream and chopped packets. It makes sense since you might have some codec you cannot parse but you can sort of safely remux if the container is the same.
For the common case I’d rather suggest to use a set of functions that always insert parsers when they can both demuxing and muxing and provide another set of functions to get arbitrary lumps of stream as provided by the container format.

Splitting a library – hashes

libavutil contains lots that is common with the other libraries that compose Libav. It grown a lot over the years and it’s time to consider splitting it.

Monolithic vs Modular

There will always be some discussion on which approach is globally better.
– Jumbling everything together so you have everything there and doesn’t matter what, you have your super hammer supporting screws, bolts, nuts and nails.
– Keeping the tools in separate boxes so you carry only the set of spanners you need when you need it.

For software libraries you have this kind of problem all the time and at multiple levels:
– Do you want to have a single huge header file with every function your library provides or a set of them organized to keep all the function related together?
– Do you want to link a single library or have the concerns split in multiple so you do not have to carry lots of stuff you do not use (storage and memory are still important in some applications).

Usually modularity comes with the price of additional initial effort (you have to think about what you are going to use a little harder) and maintenance (which library should I update?).

This blogpost is about trying to group and split bunch of unrelated functions present in a library and try to get a better API for some of them.

Libavutil

The Libav libraries are written mainly in languages (C, asm) and they focus a lot on being portable. Libavutil is the foundation.

It contains all the code that is common across libraries from the basics such as memory management to higher level data structures, to video and audio-specific basic manipulation and hashes, cryptographic primitives and lossless compressors.

A lot indeed.

Problems

Irregular Mushroom-API

Some of the highest level part of the library appeared little by little, first you need md5 and you add it, then is aes, then you want lzo. All the crypto expose direct functions to that specific hash, making those components non-optional even if you do not need them.

# libavutil/aes.h
struct AVAES;

struct AVAES *av_aes_alloc(void);

int av_aes_init(struct AVAES *a, const uint8_t *key, int key_bits, int decrypt);
void av_aes_crypt(struct AVAES *a, uint8_t *dst, const uint8_t *src, int count, uint8_t *iv, int decrypt);

# libavutil/xtea.h
typedef struct AVXTEA {
    uint32_t key[16];
} AVXTEA;

void av_xtea_init(struct AVXTEA *ctx, const uint8_t key[16]);
void av_xtea_crypt(struct AVXTEA *ctx, uint8_t *dst, const uint8_t *src,
                   int count, uint8_t *iv, int decrypt);

# libavutil/sha.h
struct AVSHA;

struct AVSHA *av_sha_alloc(void);
int av_sha_init(struct AVSHA* context, int bits);
void av_sha_update(struct AVSHA* context, const uint8_t* data, unsigned int len);
void av_sha_final(struct AVSHA* context, uint8_t *digest);

# libavutil/md5.h
struct AVMD5;

struct AVMD5 *av_md5_alloc(void);
void av_md5_init(struct AVMD5 *ctx);
void av_md5_update(struct AVMD5 *ctx, const uint8_t *src, const int len);
void av_md5_final(struct AVMD5 *ctx, uint8_t *dst);
void av_md5_sum(uint8_t *dst, const uint8_t *src, const int len);

As you might notice it got to have lots and lots of expose, similar-but-non-uniform API popping out.

And if it was acceptable having a couple of hashes always around it gets not so nice if you have more to add.

Right now libavutil exposes 50 separate headers.

Extending it is painful now

Since we already have that many different components inside it you think twice about adding more stuff (if you are careful and caring), Libav is fairly modular and people do appreciate that.

In my wishlist I have few items such as getting more decompressors natively implemented.

Every new API is a burden to maintain (if you care about legacy and you keep maintaining releasing your older software) so adding or exposing more is always something you should consider.

Abstracting some details always helps, think what would be the API if each of the supported codecs has an exposed, non-uniform set of functions to decode each?

Ideal structure

Ideally I’d have the following layout:
– libavutil: basic memory abstraction, error, logs and not much else
– libavdata: basic data structures, including refcounted buffers, dictionaries, trees and such
– libavmedia: audio samples, pixel formats, metadata, frames, packets, side data types.
– libavhash: hashes such md5, sha and such
– libavcomp: compressors such as lzo
– libavcrypto: aes, blowfish and such

API

I already described my ideal api for the codecs, today I’d detail the hashes

As seen above it is common to have init, update
final and an optional utility function sum (or calc) that takes whole buffer buffer and returns the hash.

typedef struct AVHashLibrary;
typedef struct AVHash;
typedef struct AVHashContext;

int av_hash_register_all(AVHashLibrary *hashes)

const AVHash *av_hash_get(AVHashLibrary *hashes, const char *name);

AVHashContext *avhash_open(AVHash *hash, AVDictionary *opts);

int av_hash_update(AVHashContext *ctx, const uint8_t *src, const int len);

uint8_t *av_hash_final(AVHashContext *ctx, int *len);

uint8_t *av_hash_sum(AVHashContext *ctx, const uint8_t *src, const uint64_t src_len, int *out_len);

void avhash_close(AVHashContext *hash);

The structures are fully opaque, the AVHashLibrary contains the list of available hashes and possibly some additional hidden state. In Libav we are trying to remove all the global variables so the list of hashes is explicit.

The register_all function just populates the list of hashes and possibly creates accessory lookup tables when needed.

The get call let you look up the hash by name, additional can be made to look it up by id.

The open function takes a dictionary for hash-specific configuration.

The update and final function let you calculate the hash incrementally, the sum function is a simple utility that takes a full buffer (assumed to fit an uint64_t) and produces the hash.

The NihAV from Kostya hopefully will have a similar API with TypeLibrary, Type and TypeContext structs.

Again on assert()

Since apparently there are still people not reading the fine man page.

If the macro NDEBUG was defined at the moment was last included, the macro assert() generates no code, and hence does nothing at all.
Otherwise, the macro assert() prints an error message to standard error and terminates the program by calling abort(3) if expression is false (i.e., compares equal to zero).
The purpose of this macro is to help the programmer find bugs in his program. The message “assertion failed in file foo.c, function do_bar(), line 1287” is of no help at all to a user.

I guess it is time to return on security and expand a bit which are good practices and which are misguided ideas that should be eradicated to reduce the amount of Deny Of Service waiting to happen.

Security issues

The term “Security issue” covers a lot of different kind of situations. Usually unhanded paths in the code lead to memory corruption, memory leaks, crashes and other less evident problems such as information leaks.

I’m focusing on crashes today, assume the others are usually more annoying or dangerous, it might be true or not depending on the scenarios:

If you are watching a movie and you have a glitch in the bitstream that makes the application leak some memory you would not care at all as long you can enjoy your movie. If the same glitch makes VLC to close suddenly a second before you get to see who is the mastermind behind a really twisted plot… I guess you’ll scream at whoever thought was a good idea to crash there.

If a glitch might get an attacker to run arbitrary code while you are watching your movie probably you’d like better to have your player to just crash instead.

It is a false dichotomy since what you want is to have the glitch handled properly, and keep watching the rest of the movie w/out having VLC crashing w/out any meaningful information for you to know.

Errors must be handled, trading a crash for something else you consider worse is just being naive.

What is assert exactly?

assert is a debugging facility mandated by POSIX and C89 and C99, it is a macro that more or less looks like this

#define assert()                                       \
    if (condition) {                                   \
        do_nothing();                                  \
    } else {                                           \
       fprintf(stderr, "%s %s", __LINE__, __func__);   \
       abort();                                        \
    }

If the condition does not happen crash, here the real-life version from musl

#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))

How to use it

Assert should be use to verify assumptions. While developing they help you to verify if your
assumptions meet reality. If not they tell you that should investigate because something is
clearly wrong. They are not intended to be used in release builds.
– some wise Federico while talking about another language asserts

Usually when you write some code you might do something like this to make sure you aren’t doing anything wrong, you start with

int my_function_doing_difficult_computations(Structure *s)
{
   a = some_computation(s);
   ....
   b = other_operations(a, s);
   ....
   c = some_input(s, b);
   ...
   idx = some_operation(a, b, c);

   return some_lut[idx];
}

Where idx in a signed integer, and so a, b, c are with some ranges that might or not depend on some external input.

You do not want to have idx to be outside the range of the lookup table array some_lut and you are not so sure. How to check that you aren’t getting outside the array?

When you write the code usually you iteratively improve a prototype, you can add tests to make sure every function is returning values within the expected range and you can use assert() as a poor-man C version of proper unit-testing.

If some function depends on values outside your control (e.g. an input file), you usually do validation over them and cleanly error out there. Leaving external inputs unaccounted or, even worse, put an assert() there is really bad.

Unit testing and assert()

We want to make sure our function works fine, let’s make a really tiny test.

void test_some_computation(void)
{
    Structure *s = NULL;
    int i;
    while (input_generator(&s, i)) {
       int a = some_computation(s);
       assert(a > 0 && a <10);
    }
}

It is compact and you can then run your test under gdb and inspect a bit around. Quite good if you are refactoring the innards of some_computation() and you want to be sure you did not consider some corner case.

Here assert() is quite nice since we can pack in a single line the testcase and have a simple report if something went wrong. We could do better since assert does not tell use the value or how we ended up there though.

You might not be that thorough and you can just decide to put the same assert in your function and check there, assuming you cover all the input space properly using regression tests.

To crash or not to crash

The people that consider OK crashing on runtime (remember the sad user that cannot watch his wonderful movie till the end?) suggest to leave the assert enabled at runtime.

If you consider the example above, would be better to crash than to read a random value from the memory? Again this is a false dichotomy!

You can expect failures, e.g. broken bitstreams and you want to just check and return a proper failure message.

In our case some_input() return value should be checked for failures and the return value forwarder further up till the library user that then will decide what to do.

Now remains the access to the lookup table. If you didn’t check sufficiently the other functions you might get a bogus index and if you get a bogus index you will read from random memory (crashing or not depending if the random memory is on an address mapped to the program outside). Do you want to have an assert() there? Or you’d rather ad another normal check with a normal failure path?

An correct answer is to test your code enough so you do not need to add yet another check and, in fact, if the problem arises is wrong to add a check there, or, even worse an assert(), you should just go up in the execution path and fix the problem where it is: a non validated input, a wrong “optimization” or something sillier.

There is open debate on if having assert() enabled is a good or bad practice when talking about defensive design. In C, in my opinion, it is a complete misuse. You if you want to litter your release code with tons of branches you can also spend time to implement something better and make sure to clean up correctly. Calling abort() leaves your input and output possibly in severely inconsistent state.

How to use it the wrong way

I want to trade a crash anytime the alternative is memory corruption
– some misguided guy

Assume you have something like that

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);


while (some_related_computations(s)) {
   do_stuff_(s, p);
   p += 4;
}

assert(p - buf == size);

If some_computation() and some_related_computation(s) do not agree, you might write over the allocated buffer! The naive person above starts talking about how the memory is corrupted by do_stuff() and horrible things (e.g. foreign code execution) could happen without the assert() and how even calling return at that point is terrible and would lead to horrible horrible things.

Ok, NO. Stop NOW. Go up and look at how assert is implemented. If you check at that point that something went wrong, you have the corruption already. No matter what you do, somebody could exploit it depending on how naive you had been or unlucky.

Remember: assert() does do I/O, allocates memory, raises a signal and calls functions. All that you would rather not do when your memory is corrupted is done by assert().

You can be less naive.

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);

while (some_related_computations(s) && size > 4) {
   do_stuff_(s, p);
   p    += 4;
   size -= 4;
}
assert(size != 0);

But then, instead of the assert you can just add

if (size != 0) {
    msg("Something went really wrong!");
    log("The state is %p", s->some_state);
    cleanup(s);
    goto fail;
}

This way when the “impossible” happens the user gets a proper notification and you can recover cleanly and no memory corruption ever happened.

Better than assert

Albeit being easy to use and portable assert() does not provide that much information, there are plenty of tools that can be leveraged to get better reporting.

In Closing

assert() is a really nice debugging tool and it helps a lot to make sure some state remains invariant while refactoring.

Leaving asserts in release code, on the other hand, is quite wrong, it does not give you any additional safety. Please do not buy the fairly tale that assert() saves you from the scary memory corruption issues, it does NOT.

Decoupling an API

This weekend on #libav-devel we discussed again a bit about the problems with the current core avcodec api.

Current situation

Decoding

We have 3 decoding functions for each of the supported kind of media types: Audio, Video and Subtitles.

Subtitles are already a sore thumb since they are not using AVFrame but a specialized structure, let’s ignore it for now. Audio and Video share pretty much the same signature:

int avcodec_decode_something(AVCodecContext *avctx, AVFrame *f, int *got_frame, AVPacket *p)

It takes a context pointer containing the decoder state, consumes a demuxed packet and optionally outputs a decoded frame containing raw data in a certain format (audio samples, a video frame).

The usage model is quite simple it takes packets and whenever it has enough encoded data to emit a frame it emits one, the got_frame pointer signals if a frame is ready or more data is needed.

Problem:

What if 1 AVPacket is near always enough to output 2 or more frames of raw data?

This happens with MVC and other real-world scenarios.

In general our current API cannot cope with it cleanly.

While working with the MediaSDK interface from Intel and now with MMAL for the Rasberry Pi, similar problems arisen due the natural parallelism the underlying hardware has.

Encoding

We have again 3 functions again Subtitles are somehow different, while Audio and Video are sort of nicely uniform.

int avcodec_encode_something(AVCodecContext *avctx, AVPacket *p, const AVFrame *f, int *got_packet)

It is pretty much the dual of the decoding function: the context pointer is the same, a frame of raw data enters and a packet of encoded data. Again we have a pointer to signal if we had enough data and an encoded packet had been outputted.

Problem:

Again we might get multiple AVPacket produced out of a single AVFrame data fed.

This happens when the HEVC “workaround” to encode interlaced content makes the encoder to output the two separate fields as separate encoded frames.

Again, the API cannot cope with it cleanly and threaded or otherwise parallel encoding fit the model just barely.

Decoupling the process

To fix this issue (and make our users life simpler) the idea is to split the feeding data function from the one actually providing the processed data.

int avcodec_decode_push(AVCodecContext *avctx, AVPacket *packet);
int avcodec_decode_pull(AVCodecContext *avctx, AVFrame *frame);
int avcodec_decode_need_data(AVCodecContext *avctx);
int avcodec_decode_have_data(AVCodecContext *avctx);
int avcodec_encode_push(AVCodecContext *avctx, AVFrame *frame);
int avcodec_encode_pull(AVCodecContext *avctx, AVPacket *packet);
int avcodec_encode_need_data(AVCodecContext *avctx);
int avcodec_encode_have_data(AVCodecContext *avctx);

From a single function 4 are provided, why it is simple?

The current workflow is more or less like

while (get_packet_from_demuxer(&pkt)) {
    ret = avcodec_decode_something(avctx, frame, &got_frame, pkt);
    if (got_frame) {
        render_frame(frame);
    }
    if (ret < 0) {
        manage_error(ret);
    }
}

The get_packet_from_demuxer() is a function that dequeues from some queue the encoded data or directly call the demuxer (beware: having your I/O-intensive demuxer function blocking your CPU-intensive decoding function isn’t nice), render_frame() is as well either something directly talking to some kind of I/O-subsystem or enqueuing the data to have the actual rendering (including format conversion, overlaying and scaling) in another thread.

The new API makes much easier to keep the multiple area of concern separated, so they won’t trip each other while the casual user would have something like

while (ret >= 0) {
    while ((ret = avcodec_decode_need_data(avctx)) > 0) {
        ret = get_packet_from_demuxer(&pkt);
        if (ret < 0)
           ...
        ret = avcodec_decode_push(avctx, &pkt);
        if (ret < 0)
           ...
    }
    while ((ret = avcodec_decode_have_data(avctx)) > 0) {
        ret = avcodec_decode_pull(avctx, frame);
        if (ret < 0)
           ...
        render_frame(frame);
    }
}

That has probably few more lines.

Asyncronous API

Since the decoupled API is that simple, is possible to craft something more immediate for the casual user.

typedef struct AVCodecDecodeCallback {
    int (*pull_packet)(void *priv, AVPacket *pkt);
    int (*push_frame)(void *priv, AVFrame *frame);
    void *priv_data;
} AVCodecDecodeCallback;

int avcodec_register_decode_callbacks(AVCodecContext *avctx, AVCodecDecodeCallback *cb);

int avcodec_decode_loop(AVCodecContext *avctx)
{
    AVCodecDecodeCallback *cb = avctx->cb;
    int ret;
    while ((ret = avcodec_decode_need_data(avctx)) > 0) {
        ret = cb->pull_packet(cb->priv_data, &pkt);
        if (ret < 0)
            return ret;
        ret = avcodec_decode_push(avctx, &pkt);
        if (ret < 0)
            return ret;
    }
    while ((ret = avcodec_decode_have_data(avctx)) > 0) {
        ret = avcodec_decode_pull(avctx, frame);
        if (ret < 0)
            return ret;
        ret = cb->push_frame(cb->priv_data, frame);
    }
    return ret;
}

So the actual minimum decoding loop can be just 2 calls:

ret = avcodec_register_decode_callbacks(avctx, cb);
if (ret < 0)
   ...
while ((ret = avcodec_decode_loop(avctx)) >= 0);

Cute, isn’t it?

Theory is simple …

… the practice not so much:
– there are plenty of implementation issues to take in account.
LOTS of tedious work converting all the codecs to the new API.
– lots of details to iron out (e.g. have_data() and need_data() should block or not?)

We did radical overhauls before, such as introducing reference-counted AVFrames thanks to Anton, so we aren’t much scared of reshaping and cleaning the codebase once more.

If you like the ideas posted above or you want to discuss them more, you can join the Libav irc channel or mailing list to discuss and help.

Demotivation, FUD and why I still contribute to Libav

Libav had been since its start a controversial project, mainly because lots of drama and a huge amount of manure had been thrown against it and even about 4 years after its start there are people spewing this kind of vitriolic comments.

What is Libav

How it started

Libav started when the trademark FFmpeg had been given by the owner of it Fabrice Bellard to the former FFmpeg leader Michael Niedermayer.

Michael Niedermayer managed to get demoted from his leader position by the topmost 18 people involved in FFmpeg by the time due his tendency of not following even the basic project rules. That after weeks from being voted to stay as leader by 15, 5 explicitly stating their vote was conditioned by his behavior and 1 definitely against him.

His demotion is due to acts in full disregard of the policies in place, even those enforced automatically by the svn hooks.

The fact he bullied and belittled volunteers and contributor can be checked by digging the mailing list during the months and the year before the management change and it also added up to the decision.

What aims to do

Being burnt by the unreliable leadership experience the Libav organization focused on rules both for development and for management.

Ideally having a clear set of rules and making every member and contributor abide to the very same rule would prevent abuses.

Rules in summary

  • All the code must use the same coding style
  • All the patches require a review.
    • Nothing hits the tree before a second pair of eyes read the code.
    • No part of the code is a private garden that only selected people modify on their whim.
  • Everybody must abide to a quite simple code of conduct.
    • rude behavior is not welcome.
    • flames are not welcome.
    • constructive criticism is needed.

Since FFmpeg was a project famous for horrid code quality and sketchy and irregular API and a good chunk of needed changes were vetoed by the now-demoted leader, Libav focused mainly in cleaning up the code and making the API easy to use. This lead to deep overhauls such as reference-counted frames to make the multi-threading much simpler, reference-counted packets to make extended workflow easy and a good chunk of bugs and ancient issues fixed.

Demotivation and FUD

The people involved in Libav mainly focused on code, ignoring most if not all the kind of wild statements fans of Michael Niedermayer dispersed around internet. The idea is that the code should speak by itself.

Did not work as expected. At all.

Apparently news outlets prefer to repeat whatever they find on a blogpost instead of checking the facts by reading the mailing list and the git. Many misconceptions, to use an euphemism, had been spun around and apparently won’t die that quickly if left unaccounted.

The thief theme

“The people behind Libav stole the FFmpeg infrastructure”, some of the admins even got questioned in their workplace if they really stole something. Pity that most of the people related in keeping the infrastructure functional were doing so on their own hardware and co-location, but obviously checking facts is hard, better go help shoveling manure on people.

Needless to say that gets hard keep spending time and resources and get this kind of treatment. At VDD 2014 there were some agreements on at least clear the air about this by having a strong statement about it. So far it did not happen.

The daily merge

“Libav is a non-functional fork of FFmpeg” is quite often repeated all around, Michael started to merge daily everything Libav does more or less since the beginning, making FFmpeg effectively a strict derivative of Libav.

Initially he even use a quite misleading moniker in his merge naming qatar the Libav tree. Now things are a little more fair and at least FFmpeg more or less states that is a derivative of Libav with additional features in their download page.

Few people ended up deciding to leave the Libav project since they did not appreciate the fact their hard work done on their spare time would be used by somebody else to disparage them and on top of it to ask for donation with wild claims that the “80% of all is done in FFmpeg is done by Michael”.

You are helping the evil

New people contributing to Libav quite often get some fans contacting them to enlighten the people about how evil is Libav and the people working on it and how much better FFmpeg is and how better would be to contribute to the real project.

This is a form of harassment, so far the people involved kept sort of quiet, but probably would help being more outspoken since most people around do not like to check the facts. Private emails probably should stay private, but blog post comments can be easily found and linked. Since from the beginning such “jokes” had been spun I let the reader imagine how much patience somebody should muster to keep staying quiet and just write code (this post is me getting fed up enough after 3+ years).

Motivation and Demotivation

I like to write code that is functional and doesn’t pokes you eyes when you read it after a while and I like to write code about multimedia among the other things.

I used to consider the split and the following competition had been a bliss, since the original FFmpeg project was pretty much dying due the kind of environment it had. Having a competitor tempers the bad tendencies quite well.

On the other hand I’m not cool at all in having an unfair competition, with a side piggy-backing on the other like it is happening: everything in Libav is merged inf FFmpeg, enjoying the fact the code is polished and cleaned before. Libav has the above mentioned rules so any patch has to be discussed and usually that leads to a fair amount of changes when what is cherry-picked from FFmpeg hits the mailing list for review a good deal of times is faster and simpler to just write fix covering more issues or rewrite from scratch the feature some user deemed interesting to have.

I probably won’t stop contributing to Libav since I like a lot working with the other people involved and the few people thanking me now and then make me think that giving up would just cause more harm than good in the big picture. I can understand those that decide to stop though, sometimes the amount of nonsense thrown at you by those rabid fans not knowing anything is appalling.

Hopefully writing more about it might help defusing this situation.

PS: You might also read the often-ignored initial remark from Kostya

Hardware acceleration in Libav

Multimedia formats require lots of computation power since they use fairly complex mathematical transformation. Usually most of them can be implemented efficiently in silicon requiring orders of magnitude less power to run while remaining quite fast to execute.

Hardware acceleration

Most of the current platforms, being them desktop, mobile or server, sports some kind of hardware unit to offload decoding and encoding of multimedia formats.

They usually are accessed through platform-specific API, sometimes the API is even codec-specific, making the whole implementation experience quite painful and a lot time consuming.

Depending on the specific hwaccel implementation, it could be bound to the gpu and use the gpu memory, thus requiring to manage non-system memory in specific ways adding additional burden for those that would just to have some quick gain while opening a world of interesting optimization possibilities such as zerocopy processing pipelines for transcoding or in-gpu pixel-format conversion, scaling and blending.

There are some generic wrappers such as vdpau, vaapi, dxva2 and vt that abstract some of the complexity related providing a more uniform interface, but usually there is a need of a proper (and possibly near transparent) fallback for the situations in which the hardware cannot really manage an advanced codec profile so just leveraging the generic abstractions solve just part of the problem but, as decoding goes, it provides a large performance boost while requiring some effort in managing the non-system memory.

For most of the users the learning curve is too steep for being really useful.

Libav and hwaccel

Hardware acceleration support happened to be implemented more or less around a number of specific implementation (with quite non-uniform approach, so vaapi had some hooks in the codecs, vdpau had full decoders until it had been ported to the same interface and made way nicer to use thanks to Rémi) and requiring quite a number of backend specific boilerplate code to set up the implementation specific context and then manage the opaque buffers the decoder outputs.

High level functionality

The hwaccel infrastructure is currently focused on the following items:
– the fallback from hardware to software should be as seamless as possible
– basic hardware decoders must be taken in account (e.g. for h264 some accept single NAL units and can’t parse the bitstream on their own)
– the user must have a mean to control the context setup and the full memory management

In order to do that the normal software decoder is used to parse the bitstream and depending on if the hwaccel is enabled or not, route the parsed data to the software or the hardware decoder and the output frames are then managed by the decoder frame reordering functionality if present.

This way falling back, even from a specific hwaccel to another, is sort of simple at least conceptually: every time a new extradata appears it is parsed, fed to the first hwaccel setup code if not supported optional fallback hwaccel can try and eventually the software decoder is picked.

The decoded frames, no matter if opaque hw-specific gpu-memory or normal system-memory go to the same codepath and the user has a mean to set the video rendering pipeline to take this in account.

Limit of the infrastructure

All the software evolves, since the minimalistic approach to the hardware acceleration requires a huge amount of boilerplate code and deep knowledge of the bitstream formats in use the new APIs for the new hardware tried to improve and be easier to manage for less savvy users.

API such mfx try to abstract completely and just require to get the input bitstream so the hardware input buffers can be filled and produce the frames, in presentation order, once the (parallel) decoding process yelds. It is in pull mode instead of push, so when more data is needed it gets requested and when a frame is ready it gets notified.

For an user most of the headaches related to frame management and elementary stream parsing are gone, or almost gone since some formats have multiple elementary stream representation and only a single one is supported…

For me (and then Anton), having an API like that poses multiple problems.
– having to feed a bitstream requires constructing it back from the software parsing and this is not terrible, it had been done for vda.
– the decoder wants to get the data only when the hardware buffers require more and that would require at least a queue.
– the frames outputted are already in presentation order, requiring to bypass the frame reordering logic.

Fitting a new-style hardware acceleration API in Libav

I tried the following approaches:
– Consider libmfx a normal third party decoder
– Anton complained about removing completely the ability to user hardware memory
– Implement additional hooks to keep the hwaccel interface but avoid the bitstream parsing and frame reordering.
– Anton and others complained about preventing the transparent fallback

Then I let Anton try for himself and in the end we agreed that providing an interim solution that let the user manage the memory and then trying to complete hwaccel2 on a second time would be the best.

More on the topic later 🙂

Bridging Markdown to sphinx

One of my annoying itch is documentation.

I like a lot sphinx as toolchain but the underlying rst has a quite steep learning curve and it is outright ugly to write in many common situation.

I like a lot kramdown as syntax but sadly it is ruby-only and overall the Markdown implementation for python usually have a good number of shortcomings, including the quite annoying part of not having a full AST for the extensions, making quite a pain to proper translations (e.g. moin-2 markdown can’t use the extension supported by the original since they get mangled badly during the process of node matching)…

Enters CommonMark

CommonMark is a cooperative effort to build actually a proper specification of the ubiquitous markdown syntax. The implementations usually provide a full AST and the python one (derived from the javascript one) is quite easy to understand, fast enough and easy to extend.

I know that somebody else already tried to bridge docutils and markdown, sadly parsley is a tad slow for the purpose. I gutted away the original markdown parser and wired in commonmark-py the result is a decently fast implementation that maps most of the core syntax to the docutils AST and thus makes possible to write in markdown and get it converted using the docutils output formats.

What’s left

ReStructuredText is much richer than CommonMark core, at least I should complete my work to support attributes so the manpage output would work mostly as intended.

The directive system is quite different from the one currently discussed and that will cause a good deal of headache to map the sphinx extension to document the function parameters.

As usual help welcome!

Document your project!

After discussing how to track your bugs and your contributions, let see what we have about documentation

Pain and documentation

An healthy Open Source project needs mainly contributors, contributors are usually your users. You get users if the project is known and useful. (and you do not have parasitic entities syphoning your work abusing git-merge, best luck to io.js and markdown-it to not have this experience, switching name is enough of a pain without it).

In order to gain mindshare, the best thing is making what you do easier to use and that requires documenting what you did! The process is usually boring, time consuming and every time you change something you have to make sure the documentation still matches reality.

In the opensource community we have multiple options on the kind of documentation we produce and how to produce.

Wiki

When you need to keep some structure, but you want to have an easy way to edit it wiki can be a good choice and it can lead to nice results. The information present is usually correct and if enough people keep editing it up to date.

Pros:

  • The wiki is quick to edit and you can have people contribute by just using a browser.
  • The documentation is indexed by the search engines easily
  • It can be restricted to a number of trusted people
Cons

  • The information is detached from the actual code and it could desync easily
  • Even if kept up to date, what applies to the current release is not what your poor user might have
  • Usually keeping versioned content is not that simple

Forum

Even if usually they are noisy forums are a good source of information plenty of time.
Personally I try to move interesting bits to a wiki page when I found something that is not completely transient.

Pros:

  • Usually everything require less developer interaction
  • User can share solutions to their problem effectively
Cons

  • The information can get stale even quicker that what you have in the wiki
  • Since it is mainly user-generate the solutions proposed might be suboptimal
  • Being highly interactive it requires more dedicated people to take care of unruly users

Manuals

There are lots of good toolchain to write full manuals as we have in Gentoo.

The old style xml/docbook tend to have a really steep learning curve, not to mention the even more quirky and wonderful monster such as LaTeX (and the lesser texinfo). ReStructuredText, asciidoc and some flavour of markdown seem to be a better tool for the task if you need speed and get contributors up to speed.

Pros:

  • A proper manual can be easily pinned to a specific release
  • It can be versioned using git
  • Some people still like something they can print and has a proper index
Cons

  • With the old tools it is a pain to start it
  • The learning curve can be still unbearable for most contributors
  • It requires some additional dedication to keep it up to date

What to use and why

Usually for small projects the manual is the README, once it grows usually a wiki is the best place to put notes from multiple people. If you are good at it a manual is a boon for all your users.

Tools to have documentation-in-code such as doxygen or docurium can help a lot if your project is having a single codebase.

If you need to unify a LOT of different information, like we have in Gentoo. The problems usually get much more annoying since you have contents written in multiple markups, living in multiple places and usually moving it from one place to another requires a serious editing effort (like moving from our guidexml to the current semantic wiki).

Markup suggestion

Markdown/CommonMark/Kramdown

I do like a lot CommonMark and I even started to port and extend it to be used in docutils since I find ReStructuredText too confusing for the normal users. The best quality of it is the natural flow, it is most annoying defect is that there are too many parser discrepancies and sometimes implementations disagrees. Still is better to have many good implementation than one subpar in everything (hi texinfo, I hate your toolchain).

Asciidoc

The markup is quite nice (up to a point) and the toolchain is sort of nice even if it feels like a Rube Goldberg machine. To my knowledge there is a single implementation of it and that makes me MUCH wary of using it in new projects.

ReStructuredText

The markup is not as intuitive as Asciidoc, thus quite far from Markdown immediate-use feeling, but it has great toolchain (if you like python) and it gets extended to produce lots of different well formatted documents.
It comes with loads markup features that Markdown core lacks: include directive, table of contents, pluggable generic block and span directives, 3 different flavours of tables.

Good if you can come to terms with its complexity all in all.

What’s next

Hopefully during this year among my many smaller and bigger projects, I’ll find time to put together something nice for documentation as well.

Track your issues

Issue Trackers

If you are not aware of a problem, you cannot fix it.

Having full awareness of the issues and managing it is the key of success for any kind of project (not just software).

For an open-source project it is essential that the issue tracker focuses on at least 3 areas:
Ease of use: You get reports mainly by casual users, they must spend the least amount of time to understand the tool and to provide the information.
Loudness: It must make problems easy to spot.
Data Mining: It should provide tools to query details, aggregate bugs and manipulate them.

What’s available

Right now I tried in different projects many issue trackers, sadly almost none fit the bill, they usually are actually the opposite: limited, cumbersome, hard to configure and horrible to use either to fill bugs or to actually manage them.

Bugzilla

It is by far the least bad, it has plugins to provide near-instant access thanks to Mozilla Persona, it has a rich rpc system that could be leveraged to have irc notifiers or side site statistics, importing-exporting data is almost there. As we know in Gentoo, it requires some deep manipulation and if there is nobody around to do that you can get fallouts like this when a single stubborn (and probably distracted) developer (vapier) manages to spoil the result of the goodwill of another and makes the Project overall more frail.

Mantis

It is still too rich of confusing option but its default splash views are a boon if you are wondering what’s the status of your project. No open-id/persona/single-sign-on integration sadly.

Redmine/Trac

Usually not good enough on the reporting side and, even if they are much simpler than Bugzilla, still not good for the untrained user. They integrate with the source repository view and knowledge base (aka wiki) so they can be a good starting point for small organizations.

Github/GitLab/Gogs

They have a more encompassing approach than redmine and trac, their issue tracker component is too simple in some cases (with Github not having even support for attachments and gogs not really managing tags yet) or a little too rough (no bug dependencies). But, with its immediate UI and the label-oriented approach, it is already pretty good for a large deal of projects. Sadly not Libav: we do need proper attachments.

RT

Request Tracker is overwhelming. No other words. Do not use it if you do not need to. It is too complex to configure on the admin side and is too annoying to use on the developer side. For users the interface is usually a mailbox so you can’t go wrong. Perfect if you have to manage a huge number of paying customer and you want to have detailed billing and other extremely advanced features.

Brimir

New kid of the block, it is quite simple, way too simple. Its mail rendering makes it not really great but is pretty much a nice concept waiting to bloom. (Will it?)

Suggestion welcome

Do you know any better opensource issue tracker? Please comment down =)