In the past month or so I started helping Vittorio on adding one of the important missing feature to our h264 decoder. Multi View support.
The basic idea of this feature is quite simple, you are shooting a movie with multiple angles, something is bound to be sort of common and you’d like to ensure frame precision.
So what about encoding all the simultaneous frames captured in the same elementary stream, share across the different layers as much as you could and then let the decoder output the frames somehow?
Since we know that all the containers have problems might be not completely a bogus idea to have the codec taking care of it. Even better if the resulting aggregated bitstream is more compact than the sum of the single ones.
High level structure
What’s different in h264-mvc than the normal h264?
Not a lot, in fact the main layer is exactly the same and a normal decoder can just skip over the additional bits (3 NALs more or less) and just decode as usual.
Basically there is a NAL unit to signal which layer we are currently working on, a NAL to store the SPS specific per layer and a NAL to keep the actual frame data.
Beside that everything is exactly the same.
So why it isn’t already available, you made it look easy?!
Sadly it would be easy if the decoder we have isn’t _that_ convoluted with many components entangled in a monolithic entity, with code that grew over the years to adapt to different needs.
Architectural pain points
Per slice multithreaded decoding made the code quite hard to follow since you then have a master context, h that in certain functions is actually h0 and a slice specific copy hx that sometimes becomes h and such.
Per frame multhtreaded decoding luckily doesn’t get in the way too much for now.
Having to touch a large file of about 4k lines of code in itself isn’t _so_ nice, split view as you like for editing, you end up waiting a single core of you cpu doing the work.
The h264-mvc is a fringe feature for many and if you care about speed you want to not have all the cruft around slowing down. What’s is for you a feature, for many is just cruft.
- MVC support must be completely optional or not slow down the normal decoding at all.
- MVC support must not make the code harder to follow than it is now, so hacking your way is not an option.
MVC should give me a pony, purple
First take the low hanging fruits while you think what’s the best route to achieve your goal.
The first step is always refactor and cleanup. As you, hopefully, do not cook on a dirty kitchen, people shouldn’t
write code on top of crufty one.
Split the monster
In Libav everything compiles quite fast beside for vc1(vc1dec.c is 6k loc) and h264(h264.c was around 6k loc).
New codecs such as vp9 or hevc landed already split in smaller chunks.
Shuffling the code should be simple enough, so we had h264.c split in h264_slice.c, h264_mb.c and such. That helps having shorter (re)build time and makes you easier to focus.
Vittorio tried to remove the dependency over the mpeg12 context in order to make easier to follow the code, it was one of the pending issues since years. Now h264 doesn’t require mpeg12 in order to build, that will make probably happier our friends working on Chrome and everybody else needing to have _just_ few selected features in their build.
Pave the road
Once you divided the problem in smaller sub problems (parsing the new nals, store the information in an appropriate data structure, do the actual decoding and store the results somewhere accessible) you can start working on adapting the code to fit. That means reordering some code, splitting functions that would be shared and maybe slay some bugs hidden in the code weed while at it.
We are halfway!
We got the frame splitting, nal parsing pretty much in working shape and is not sent for review just because in itself is not
The frame data decoding is pending some patches from me that try to simplify the slice header parsing so enough of it could be shared w/out adding more branches. I hacked it once and I know the approach used works.
The code to store multiple views in a single frame has a whole blueprint being evaluated.
Test the actual decoding and hopefully make so the frame reference code behaves as expected, this will be probably the most annoying and time consuming task if we are unlucky. That code bites.