AVScale – part1

swscale is one of the most annoying part of Libav, after a couple of years since the initial blueprint we have something almost functional you can play with.

Colorspace conversion and Scaling

Before delving in the library architecture and the outher API probably might be good to make a extra quick summary of what this library is about.

Most multimedia concepts are more or less intuitive:
encoding is taking some data (e.g. video frames, audio samples) and compress it by leaving out unimportant details
muxing is the act of storing such compressed data and timestamps so that audio and video can play back in sync
demuxing is getting back the compressed data with the timing information stored in the container format
decoding inflates somehow the data so that video frames can be rendered on screen and the audio played on the speakers

After the decoding step would seem that all the hard work is done, but since there isn’t a single way to store video pixels or audio samples you need to process them so they work with your output devices.

That process is usually called resampling for audio and for video we have colorspace conversion to change the pixel information and scaling to change the amount of pixels in the image.

Today I’ll introduce you to the new library for colorspace conversion and scaling we are working on.

AVScale

The library aims to be as simple as possible and hide all the gory details from the user, you won’t need to figure the heads and tails of functions with a quite large amount of arguments nor special-purpose functions.

The API itself is modelled after avresample and approaches the problem of conversion and scaling in a way quite different from swscale, following the same design of NAScale.

Everything is a Kernel

One of the key concept of AVScale is that the conversion chain is assembled out of different components, separating the concerns.

Those components are called kernels.

The kernels can be conceptually divided in two kinds:
Conversion kernels, taking an input in a certain format and providing an output in another (e.g. rgb2yuv) without changing any other property.
Process kernels, modifying the data while keeping the format itself unchanged (e.g. scale)

This pipeline approach gets great flexibility and helps code reuse.

The most common use-cases (such as scaling without conversion or conversion with out scaling) can be faster than solutions trying to merge together scaling and conversion in a single step.

API

AVScale works with two kind of structures:
AVPixelFormaton: A full description of the pixel format
AVFrame: The frame data, its dimension and a reference to its format details (aka AVPixelFormaton)

The library will have an AVOption-based system to tune specific options (e.g. selecting the scaling algorithm).

For now only avscale_config and avscale_convert_frame are implemented.

So if the input and output are pre-determined the context can be configured like this:

AVScaleContext *ctx = avscale_alloc_context();

if (!ctx)
    ...

ret = avscale_config(ctx, out, in);
if (ret < 0)
    ...

But you can skip it and scale and/or convert from a input to an output like this:

AVScaleContext *ctx = avscale_alloc_context();

if (!ctx)
    ...

ret = avscale_convert_frame(ctx, out, in);
if (ret < 0)
    ...

avscale_free(&ctx);

The context gets lazily configured on the first call.

Notice that avscale_free() takes a pointer to a pointer, to make sure the context pointer does not stay dangling.

As said the API is really simple and essential.

Help welcome!

Kostya kindly provided an initial proof of concept and me, Vittorio and Anton prepared this preview on the spare time. There is plenty left to do, if you like the idea (since many kept telling they would love a swscale replacement) we even have a fundraiser.

Leave a Reply

Your email address will not be published.