Colorspace conversion and Scaling
Before delving in the library architecture and the outher API probably might be good to make a extra quick summary of what this library is about.
Most multimedia concepts are more or less intuitive:
– encoding is taking some data (e.g. video frames, audio samples) and compress it by leaving out unimportant details
– muxing is the act of storing such compressed data and timestamps so that audio and video can play back in sync
– demuxing is getting back the compressed data with the timing information stored in the container format
– decoding inflates somehow the data so that video frames can be rendered on screen and the audio played on the speakers
After the decoding step would seem that all the hard work is done, but since there isn’t a single way to store video pixels or audio samples you need to process them so they work with your output devices.
That process is usually called resampling for audio and for video we have colorspace conversion to change the pixel information and scaling to change the amount of pixels in the image.
Today I’ll introduce you to the new library for colorspace conversion and scaling we are working on.
The library aims to be as simple as possible and hide all the gory details from the user, you won’t need to figure the heads and tails of functions with a quite large amount of arguments nor special-purpose functions.
Everything is a Kernel
One of the key concept of AVScale is that the conversion chain is assembled out of different components, separating the concerns.
Those components are called kernels.
The kernels can be conceptually divided in two kinds:
– Conversion kernels, taking an input in a certain format and providing an output in another (e.g. rgb2yuv) without changing any other property.
– Process kernels, modifying the data while keeping the format itself unchanged (e.g. scale)
This pipeline approach gets great flexibility and helps code reuse.
The most common use-cases (such as scaling without conversion or conversion with out scaling) can be faster than solutions trying to merge together scaling and conversion in a single step.
AVScale works with two kind of structures:
– AVPixelFormaton: A full description of the pixel format
– AVFrame: The frame data, its dimension and a reference to its format details (aka AVPixelFormaton)
The library will have an AVOption-based system to tune specific options (e.g. selecting the scaling algorithm).
For now only
avscale_convert_frame are implemented.
So if the input and output are pre-determined the context can be configured like this:
AVScaleContext *ctx = avscale_alloc_context(); if (!ctx) ... ret = avscale_config(ctx, out, in); if (ret < 0) ...
But you can skip it and scale and/or convert from a input to an output like this:
AVScaleContext *ctx = avscale_alloc_context(); if (!ctx) ... ret = avscale_convert_frame(ctx, out, in); if (ret < 0) ... avscale_free(&ctx);
The context gets lazily configured on the first call.
avscale_free() takes a pointer to a pointer, to make sure the context pointer does not stay dangling.
As said the API is really simple and essential.
Kostya kindly provided an initial proof of concept and me, Vittorio and Anton prepared this preview on the spare time. There is plenty left to do, if you like the idea (since many kept telling they would love a swscale replacement) we even have a fundraiser.