Decoupling an API – Part II

During the VDD we had lots of discussions and I enjoyed reviewing the initial NihAV implementation. Kostya already wrote some more about the decoupled API that I described at high level here.

This article is about some possible implementation details, at least another will follow.

The new API requires some additional data structures, mainly something to keep the data that is being consumed/produced, additional implementation-callbacks in AVCodec and possibly a mean to skip the queuing up completely.

Data Structures

AVPacketQueue and AVFrameQueue

In the previous post I considered as given some kind of Queue.

Ideally the API for it could be really simple:

typedef struct AVPacketQueue;

AVPacketQueue *av_packet_queue_alloc(int size);
int av_packet_queue_put(AVPacketQueue *q, AVPacket *pkt);
int av_packet_queue_get(AVPacketQueue *q, AVPacket *pkt);
int av_packet_queue_size(AVPacketQueue *q);
void av_packet_queue_free(AVPacketQueue **q);

typedef struct AVFrameQueue;

AVFrameQueue *av_frame_queue_alloc(int size);
int av_frame_queue_put(AVFrameQueue *q, AVPacket *pkt);
int av_frame_queue_get(AVFrameQueue *q, AVPacket *pkt);
int av_frame_queue_size(AVFrameQueue *q);
void av_frame_queue_free(AVFrameQueue **q);

Internally it leverages the ref-counted API (av_packet_move_ref and av_frame_move_ref) and any data structure that could fit the queue-usage. It will be used in a multi-thread scenario so a form of Lock has to be fit into it.

We have already something specific for AVPlay, using a simple Linked List and a FIFO for some other components that have a near-constant maximum number of items (e.g. avconv, NVENC, QSV).

Possibly also a Tree could be used to implement something such as av_packet_queue_insert_by_pts and have some forms of reordering happen on the fly. I’m not a fan of it, but I’m sure someone will come up with the idea..

The Queues are part of AVCodecContext.

typedef struct AVCodecContext {
    // ...

    AVPacketQueue *packet_queue;
    AVFrameQueue *frame_queue;

    // ...
} AVCodecContext;

Implementation Callbacks

In Libav the AVCodec struct describes some specific codec features (such as the supported framerates) and hold the actual codec implementation through callbacks such as init, decode/encode2, flush and close.
The new model obviously requires additional callbacks.

Once the data is in a queue it is ready to be processed, the actual decoding or encoding can happen in multiple places, for example:

In avcodec_*_push or avcodec_*_pull, once there is enough data. In that case the remaining functions are glorified proxies for the matching queue function.
somewhere else such as a separate thread that is started on avcodec_open or the first avcodec_decode_push and is eventually stopped once the context related to it is freed by avcodec_close. This is what happens under the hood when you have certain hardware acceleration.

Common

typedef struct AVCodec {
    // ... previous fields
    int (*need_data)(AVCodecContext *avctx);
    int (*has_data)(AVCodecContext *avctx);
    // ...
} AVCodec;

Those are used by both the encoder and decoder, some implementations such as QSV have functions that can be used to probe the internal state in this regard.

Decoding

typedef struct AVCodec {
    // ... previous fields
    int (*decode_push)(AVCodecContext *avctx, AVPacket *packet);
    int (*decode_pull)(AVCodecContext *avctx, AVFrame *frame);
    // ...
} AVCodec;

Those two functions can take a portion of the work the current decode function does, for example:
– the initial parsing and dispatch to a worker thread can happen in the _push.
– reordering and blocking until there is data to output can happen on _pull.

Assuming the reordering does not happen outside the pull callback in some generic code.

Encoding

typedef struct AVCodec {
    // ... previous fields
    int (*encode_push)(AVCodecContext *avctx, AVFrame *frame);
    int (*encode_pull)(AVCodecContext *avctx, AVPacket *packet);
} AVCodec;

As per the Decoding callbacks, encode2 workload is split. the _push function might just keep queuing up until there are enough frames to complete the initial the analysis, while, for single thread encoding, the rest of the work happens at the _pull.

Yielding data directly

So far the API mainly keeps some queue filled and let some magic happen under the hood, let see some usage examples first:

Simple Usage

Let’s expand the last example from the previous post: register callbacks to pull/push the data and have some simple loops.

Decoding

typedef struct DecodeCallback {
    int (*pull_packet)(void *priv, AVPacket *pkt);
    int (*push_frame)(void *priv, AVFrame *frame);
    void *priv_data_pull, *priv_data_push;
} DecodeCallback;

Two pointers since you pull from a demuxer+parser and you push to a splitter+muxer.

int decode_loop(AVCodecContext *avctx, DecodeCallback *cb)
{
    AVPacket *pkt  = av_packet_alloc();
    AVFrame *frame = av_frame_alloc();
    int ret;
    while ((ret = avcodec_decode_need_data(avctx)) > 0) {
        ret = cb->pull_packet(cb->priv_data_pull, pkt);
        if (ret < 0)
            goto end;
        ret = avcodec_decode_push(avctx, pkt);
        if (ret < 0)
            goto end;
    }
    while ((ret = avcodec_decode_have_data(avctx)) > 0) {
        ret = avcodec_decode_pull(avctx, frame);
        if (ret < 0)
            goto end;
        ret = cb->push_frame(cb->priv_data_push, frame);
        if (ret < 0)
            goto end;
    }

end:
    av_frame_free(&frame);
    av_packet_free(&pkt);
    return ret;
}

Encoding

For encoding something quite similar can be done:

typedef struct EncodeCallback {
    int (*pull_frame)(void *priv, AVFrame *frame);
    int (*push_packet)(void *priv, AVPacket *packet);
    void *priv_data_push, *priv_data_pull;
} EncodeCallback;

The loop is exactly the same beside the data types swapped.

int encode_loop(AVCodecContext *avctx, EncodeCallback *cb)
{
    AVPacket *pkt  = av_packet_alloc();
    AVFrame *frame = av_frame_alloc();
    int ret;
    while ((ret = avcodec_encode_need_data(avctx)) > 0) {
        ret = cb->pull_frame(cb->priv_data_pull, frame);
        if (ret < 0)
            goto end;
        ret = avcodec_encode_push(avctx, frame);
        if (ret < 0)
            goto end;
    }
    while ((ret = avcodec_encode_have_data(avctx)) > 0) {
        ret = avcodec_encode_pull(avctx, pkt);
        if (ret < 0)
            goto end;
        ret = cb->push_packet(cb->priv_data_push, pkt);
        if (ret < 0)
            goto end;
    }

end:
    av_frame_free(&frame);
    av_packet_free(&pkt);
    return ret;
}

Transcoding

Transcoding, the naive way, could be something such as

int transcode(AVFormatContext *mux,
              AVFormatContext *dem,
              AVCodecContext *enc,
              AVCodecContext *dec)
{
    DecodeCallbacks dcb = {
        get_packet,
        av_frame_queue_put,
        dem, enc->frame_queue };
    EncodeCallbacks ecb = {
        av_frame_queue_get,
        push_packet,
        enc->frame_queue, mux };
    int ret = 0;

    while (ret > 0) {
        if ((ret = decode_loop(dec, &dcb)) > 0)
            ret = encode_loop(enc, &ecb);
    }
}

One loop feeds the other throught the queue. get_packet and push_packet are muxing and demuxing functions, they might end up being other two Queue functions once the AVFormat layer gets a similar overhaul.

Advanced usage

From the examples above you would notice that in some situation you would possibly do better,
all the loops pull data from a queue push it immediately to another:

why not feeding right queue immediately once you have the data ready?
why not doing some processing before feeding the decoded data to the encoder, such as conver the pixel format?

Here some additional structures and functions to enable advanced users:

typedef struct AVFrameCallback {
    int (*yield)(void *priv, AVFrame *frame);
    void *priv_data;
} AVFrameCallback;

typedef struct AVPacketCallback {
    int (*yield)(void *priv, AVPacket *pkt);
    void *priv_data;
} AVPacketCallback;

typedef struct AVCodecContext {
// ...

AVFrameCallback *frame_cb;
AVPacketCallback *packet_cb;

// ...

} AVCodecContext;

int av_frame_yield(AVFrameCallback *cb, AVFrame *frame)
{
    return cb->yield(cb->priv_data, frame);
}

int av_packet_yield(AVPacketCallback *cb, AVPacket *packet)
{
    return cb->yield(cb->priv_data, packet);
}

Instead of using directly the Queue API, would be possible to use yield functions and give the user a mean to override them.

Some API sugar could be something along the lines of this:

int avcodec_decode_yield(AVCodecContext *avctx, AVFrame *frame)
{
    int ret;

    if (avctx->frame_cb) {
        ret = av_frame_yield(avctx->frame_cb, frame);
    } else {
        ret = av_frame_queue_put(avctx->frame_queue, frame);
    }

    return ret;
}

Whenever a frame (or a packet) is ready it could be passed immediately to another function, depending on your threading model and cpu it might be much more efficient skipping some enqueuing+dequeuing steps such as feeding directly some user-queue that uses different datatypes.

This approach might work well even internally to insert bitstream reformatters after the encoding or before the decoding.

Open problems

The callback system is quite powerful but you have at least a couple of issues to take care of:
– Error reporting: when something goes wrong how to notify what broke?
– Error recovery: how much the user have to undo to fallback properly?

Probably this part should be kept for later, since there is already a huge amount of work.

What’s next

Muxing and demuxing

Ideally the container format layer should receive the same kind of overhaul, I’m not even halfway documenting what should
change, but from this blog post you might guess the kind of changes. Spoiler: The I/O layer gets spun in a separate library.

Proof of Concept

Soon^WNot so late I’ll complete a POC out of this and possibly hack avplay so that either it uses QSV or videotoolbox as test-case (depending on which operating system I’m playing with when I start), probably I’ll see which are the limitations in this approach soon.

If you like the ideas posted above or you want to discuss them more, you can join the Libav irc channel or mailing list to discuss and help.

2 thoughts on “Decoupling an API – Part II”

Kostya says:

October 2, 2015 at 2:54 pm

Mind you, I’ve written about different system (and different implementation). Try writing an example of muxing output of two demuxers into one file (and don’t forget you can have several streams there). I suspect you’ll have a fragile web of callbacks in the end.

lu_zero says:

October 2, 2015 at 3:18 pm

I wouldn’t use the callbacks to do that.
As said it might be beneficial just on some specific situations, anything more would just be problematic.

Fragile probably describes it better, since on this kind of setup reporting errors and reacting to them gets hairy.

Data Structures

AVPacketQueue and AVFrameQueue

Implementation Callbacks

Common

Decoding

Encoding

Yielding data directly

Simple Usage

Decoding

Encoding

Transcoding

Advanced usage

Open problems

What’s next

Muxing and demuxing

Proof of Concept

2 thoughts on “Decoupling an API – Part II”

Leave a Reply to Kostya Cancel reply