{"id":469,"date":"2015-06-06T19:13:42","date_gmt":"2015-06-06T19:13:42","guid":{"rendered":"http:\/\/blogs.gentoo.org\/lu_zero\/?p=469"},"modified":"2015-08-02T10:05:49","modified_gmt":"2015-08-02T10:05:49","slug":"rethinking-avformat-part-1","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/06\/06\/rethinking-avformat-part-1\/","title":{"rendered":"Rethinking AVFormat &#8211; part 1"},"content":{"rendered":"<p>Container formats should be just a boring application of serialization of multiple arrays of tuples timestamp-binary blob.<\/p>\n<p>Instead there are tons of implementation details and there are <a href=\"http:\/\/en.wikipedia.org\/wiki\/Real-time_Transport_Protocol\">fun<\/a><br \/>\nand <a href=\"http:\/\/en.wikipedia.org\/wiki\/Material_Exchange_Format\">exceedingly<\/a> <a href=\"http:\/\/en.wikipedia.org\/wiki\/MPEG_program_stream\">annoying<\/a> means to lose your sanity.<\/p>\n<p>This post is yet another post about <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/tag\/api\/\">APIs<\/a> you can see other <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/03\/23\/decoupling-an-api\/\">here<\/a> and <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/05\/02\/splitting-a-library-hashes\/\">here<\/a>.<\/p>\n<h2>Current Status<\/h2>\n<p>In <a href=\"http:\/\/libav.org\">Libav<\/a> we have <a href=\"https:\/\/libav.org\/doxygen\/master\/group__libavf.html\">libavformat<\/a> taking care of general <b>I\/O<\/b>, <b>Muxing<\/b>, <b>Demuxing<\/b>.<\/p>\n<p>This blog post will not cover the additional grouping given by <strong>Programs<\/strong>, <strong>Chapters<\/strong> and such to not make the whole article huge and just focus on the basics.<\/p>\n<h3>I\/O<\/h3>\n<p>The AVIO abstraction provides a mean to uniformly access content stored in files, available as remote streams (e.g. served through http or rtmp) or through custom implementations.<\/p>\n<p>This part of the API is rightly coupled with the Muxer and Demuxer implementation.<\/p>\n<p>It uses the common <code>Context<\/code> pattern you can find across the rest of Libav with some of twists:<\/p>\n<ul>\n<li>The protocol handler can be guessed using the url provided, e.g. <code>file:\/\/\/tmp\/foo<\/code>.<\/li>\n<li>The functions that allocate a context take an extra parameter than the usual <code>options<\/code> AVDictionary in the form of a callback function.<\/li>\n<li>You can create your own custom protocol easily.<\/li>\n<\/ul>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">avio_open2<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">**<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">url<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">flags<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"n\">AVIOInterruptCB<\/span> <span class=\"o\">*<\/span><span class=\"n\">int_cb<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVDictionary<\/span> <span class=\"o\">**<\/span><span class=\"n\">options<\/span><span class=\"p\">)<\/span>\n\n<span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">avio_alloc_context<\/span><span class=\"p\">(<\/span><span class=\"kt\">unsigned<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buffer<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buffer_size<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">write_flag<\/span><span class=\"p\">,<\/span> <span class=\"kt\">void<\/span> <span class=\"o\">*<\/span><span class=\"n\">opaque<\/span><span class=\"p\">,<\/span>\n                                <span class=\"kt\">int<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">read_packet<\/span><span class=\"p\">)(<\/span><span class=\"kt\">void<\/span> <span class=\"o\">*<\/span><span class=\"n\">opaque<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buf_size<\/span><span class=\"p\">),<\/span>\n                                <span class=\"kt\">int<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">write_packet<\/span><span class=\"p\">)(<\/span><span class=\"kt\">void<\/span> <span class=\"o\">*<\/span><span class=\"n\">opaque<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buf_size<\/span><span class=\"p\">),<\/span>\n                                <span class=\"kt\">int64_t<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">seek<\/span><span class=\"p\">)(<\/span><span class=\"kt\">void<\/span> <span class=\"o\">*<\/span><span class=\"n\">opaque<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int64_t<\/span> <span class=\"n\">offset<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">whence<\/span><span class=\"p\">))<\/span>\n<span class=\"kt\">int<\/span> <span class=\"n\">avio_closep<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">**<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>The api tries to mimic the C stdio plus lots of API sugar.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"cp\"># core functions<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_read<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">size<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_write<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">size<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">int64_t<\/span> <span class=\"nf\">avio_seek<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int64_t<\/span> <span class=\"n\">offset<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">whence<\/span><span class=\"p\">);<\/span>\n\n\n<span class=\"cp\"># simple integer readers<\/span>\n<span class=\"kt\">int<\/span>          <span class=\"nf\">avio_r8<\/span>  <span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">uint64_t<\/span>     <span class=\"nf\">avio_rb64<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">uint64_t<\/span>     <span class=\"nf\">avio_rl64<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rb16<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rb24<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rb32<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rl16<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rl24<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">avio_rl32<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\"># simple integer writers<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_w8<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wb16<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wb24<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wb32<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wb64<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint64_t<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wl16<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wl24<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wl32<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">avio_wl64<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint64_t<\/span> <span class=\"n\">val<\/span><span class=\"p\">);<\/span>\n\n\n<span class=\"cp\"># utf8 and utf16 strings<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_get_str<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">pb<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">maxlen<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buflen<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_get_str16le<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">pb<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">maxlen<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buflen<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_get_str16be<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">pb<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">maxlen<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">buf<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">buflen<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_put_str<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">str<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_put_str16le<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">str<\/span><span class=\"p\">);<\/span>\n\n<span class=\"p\">...<\/span> <span class=\"p\">(<\/span><span class=\"n\">and<\/span> <span class=\"n\">more<\/span><span class=\"p\">)<\/span> <span class=\"p\">...<\/span>\n<\/pre>\n<\/div>\n<h4>Buffering<\/h4>\n<p>All the function use an intermediate buffer to back reads and writes, the buffer can be explicitly flushed or it gets flushed automatically once the request would end outside it.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">void<\/span> <span class=\"nf\">avio_flush<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>A special kind of AVIOContext is a dynamic write buffer, it extends on demand and can be used to build complex recourring patterns once and write them as many time as needed.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">avio_open_dyn_buf<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">**<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">avio_close_dyn_buf<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">**<\/span><span class=\"n\">pbuffer<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<h4>Error handling<\/h4>\n<p>An I\/O layer has to take in account the fact the resource being read or written could be abruptly disappear or suddenly slow down. This is valid for both local and remote resources.<\/p>\n<p>The internal buffer allocation might fail.<\/p>\n<p>A seek too far could lead to the end of file.<\/p>\n<p>AVIO approach to errors is quite simplicistic:<br \/>\n&#8211; A write can silently fail.<br \/>\n&#8211; A failing read just returns <code>0<\/code>-ed buffer or value.<br \/>\n&#8211; All the functions set the <code>error<\/code> field or the <code>eof_reached<\/code> field.<\/p>\n<p>Is up to the user to decide when to check for I\/O problems or leverage the <code>AVIOInterruptCB<\/code> to implement timeouts or other mean to interrupt a read or a write that otherwise would just quietly block till it is completed.<\/p>\n<h3>Demuxing (and Probing)<\/h3>\n<p>The AVFormat part taking care of input streams can be split in three: <strong>Probing<\/strong> the data to guess the right demuxer, the actual <strong>Demuxing<\/strong> and optionally parse the demuxed data and fit it in packets containing the information needed by the decoder to decode a frame of video or a matching amount of audio samples, later I call it frame-worth amount of data and I call this process <b>chopping<\/b> amorphous data streams. It is colorful as expression but represents quite well the endeavor.<\/p>\n<h4>Probing<\/h4>\n<p>The Probe functions take an arbitrary big chunk of data (stored in a <code>AVProbeData<\/code> struct) and figure out which demuxer should be able to actually parse it correctly.<\/p>\n<p>As a rule of thumb probes need to be fast since all of them have to be run over the data at least once and possibly multiple times since if the result is not really conclusive increasing the data and trying again is an option.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"n\">AVInputFormat<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_probe_input_format2<\/span><span class=\"p\">(<\/span><span class=\"n\">AVProbeData<\/span> <span class=\"o\">*<\/span><span class=\"n\">pd<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">is_opened<\/span><span class=\"p\">,<\/span>\n                                      <span class=\"kt\">int<\/span> <span class=\"o\">*<\/span><span class=\"n\">score_max<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>An helper function to probe from an <code>AVIOContext<\/code> and get the possible input format is provided.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">av_probe_input_buffer<\/span><span class=\"p\">(<\/span><span class=\"n\">AVIOContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">pb<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVInputFormat<\/span> <span class=\"o\">**<\/span><span class=\"n\">fmt<\/span><span class=\"p\">,<\/span>\n                          <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">filename<\/span><span class=\"p\">,<\/span> <span class=\"kt\">void<\/span> <span class=\"o\">*<\/span><span class=\"n\">logctx<\/span><span class=\"p\">,<\/span>\n                          <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">offset<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">max_probe_size<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>It used internally by <code>avformat_open_input<\/code> to automatically figure out the demuxer to use and it might look a little confusing.<\/p>\n<h4>Demuxing<\/h4>\n<p>Once that the input format is either guessed or selected the actual muxing conceptually is just providing <code>AVPacket<\/code>s<br \/>\nas they are parsed. You might want to reposition within the stream at random times (the infamous <strong>seeking<\/strong> opening yet another can of worms).<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">avformat_open_input<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">**<\/span><span class=\"n\">ps<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">filename<\/span><span class=\"p\">,<\/span>\n                        <span class=\"n\">AVInputFormat<\/span> <span class=\"o\">*<\/span><span class=\"n\">fmt<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVDictionary<\/span> <span class=\"o\">**<\/span><span class=\"n\">options<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_read_frame<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVPacket<\/span> <span class=\"o\">*<\/span><span class=\"n\">pkt<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">avformat_close_input<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">**<\/span><span class=\"n\">ps<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<h5>Figuring out the data inside the format<\/h5>\n<p>Some container formats keep the information regarding their contents in a global header at the start of the file, other, that could have new data streams appearing at random times, do not.<\/p>\n<p>Since there is no easy mean to figure out which kind of data they are storing, the only safe way to figure out is to try to decode some packets in order to know which kind of data is available, <code>avformat_find_stream_info<\/code>.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">avformat_find_stream_info<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">ic<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVDictionary<\/span> <span class=\"o\">**<\/span><span class=\"n\">options<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>The apparently simple function does a lot of work behind the scenes: it demuxes and <strong>decodes<\/strong> a settable number of packets before giving up and keeps all of them in an internal queue so that they will be available for demuxing even if the input stream is not seekable.<\/p>\n<h5>Getting the data outside<\/h5>\n<p>Containers such as <a href=\"http:\/\/en.wikipedia.org\/wiki\/MPEG_program_stream\">MPEG PS<\/a> mux data in small fixed-sized chunks<br \/>\nwhile usually muxers and decoders expect to receive AVPackets containing enough data to produce a frame.<\/p>\n<p>Specific parsers can be inserted automatically to take amorphous stream of demuxed data and chop out of it AVPackets containing the expected amount of data.<\/p>\n<p>This happens usually automatically so the user does not have to care about it as long as the codec parser is present.<\/p>\n<h5>Timestamps<\/h5>\n<p>The multimedia data is expected to carry a timestamp to present at the same time video frames and audio frames (and subtitles).<\/p>\n<p>Some containers do provide directly such timestamps, other do not, requiring some amount of guesswork by some heuristics that might or might not work depending on the codec at hand.<\/p>\n<p>For example, if the container is supposed to not allow variable frame rate, the implicit time stamp for video can be deduced from the frame number. This might not work as expected if the codec uses B-frames and requires some form<br \/>\nof reordering.<\/p>\n<p>This part in Libav is sort of hidden and often causing a number of problems.<\/p>\n<h5>Seeking<\/h5>\n<p>Seeking is quite a different and large can of worms.<\/p>\n<p>Ideally seeking just sets the AVIOContext to a certain position and the demuxer keeps working from there.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">av_seek_frame<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">stream_index<\/span><span class=\"p\">,<\/span>\n                  <span class=\"kt\">int64_t<\/span> <span class=\"n\">timestamp<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">flags<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>Depending on the container format and the codec picking the correct byte offset from the user provided timestamp can be incredibly simple or really complex, with various degrees of precision.<\/p>\n<p>Some format provide an precise index so a plain lookup is enough, a dichotomic search looking for the closest I-frame is the common case and in the worst situation a linear search might be required.<\/p>\n<p>In some cases auxiliary indexes are built to speed up seeking within previously parsed areas.<\/p>\n<p>Seeking is not fun at the demuxer level and gets even worse at the codec level if the data provided is not the one expected.<\/p>\n<h3>Muxing<\/h3>\n<p>Muxing is sort of simpler than demuxing. The output format is always known and the data always come in AVPackets matching a frame-worth of raw data and possibly sporting correct timestamps.<\/p>\n<p>API-wise it expects an <code>AVFormatContext<\/code> with the <code>oformat<\/code> set to the correct <code>AVOutputFormat<\/code> and the <code>pb<\/code><br \/>\nset with an allocated <code>AVIOContext<\/code> and populated <code>AVStream<\/code>s.<\/p>\n<p>Once the <code>AVFormatContext<\/code> is configured is possible to write the packets. First the global header should be written, then as many packets as needed are muxed, interleaving audio and video so that <strong>demuxing<\/strong> and <strong>seeking<\/strong> work correctly.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"kt\">int<\/span> <span class=\"nf\">avformat_write_header<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVDictionary<\/span> <span class=\"o\">**<\/span><span class=\"n\">options<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_interleaved_write_frame<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVPacket<\/span> <span class=\"o\">*<\/span><span class=\"n\">pkt<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_write_trailer<\/span><span class=\"p\">(<\/span><span class=\"n\">AVFormatContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<h4>Bitstream filtering<\/h4>\n<p>Some codecs have multiple possible representation, e.g. <a href=\"http:\/\/en.wikipedia.org\/wiki\/H.264\/MPEG-4_AVC\">H264<\/a> has the AVCC bitstream format and the Annex B bitstream format. Come containers support both, other expect only one or the other. Currently the correct converter from a bitstream to another must be inserted manually.<\/p>\n<h4>Packet interleaving<\/h4>\n<p>Certain container formats have quite peculiar muxing rules. This is normally hidden from the user, in certain cases being able to override it is a boon.<\/p>\n<h2>Shortcomings summary<\/h2>\n<p>In the next post I will explain how I would improve the situation, today post is mainly to introduce the structure of AVFormat and start explaining what should be fixed. Here a short list of what I&#8217;d like to fix sooner than later.<\/p>\n<h3>Non-uniform API<\/h3>\n<ul>\n<li>There is quite a mixture of <code>av_<\/code> and <code>avformat_<\/code> namespaces.<\/li>\n<li>The muxing and demuxing APIs are sufficiently confusing (and surely I should complete my <code>avformat_open_output<\/code> to reduce the boilerplate)<\/li>\n<\/ul>\n<h3>Abstractions Leaking the wrong way<\/h3>\n<ul>\n<li>\nThe <strong>demuxing<\/strong> side automagically inserts parsers to chop data streams in a frame-worth amount of data while the <strong>muxing<\/strong> side would just fail if the bitstream provided is not matching the one required by the container format.\n<\/li>\n<li>\nThere is quite of hidden magic happening in <code>avformat_find_stream_info<\/code> and just recently we added options to at least flush the buffer it keeps to probe for codecs. Having a better function and a better mean to control this kind of internal buffer would be surely appreciated by the user that need to keep the latency low.\n<\/li>\n<li>\nThere is no good mean to be notified if the number of streams change (new streams found or old streams disappearing).\n<\/li>\n<\/ul>\n<h3>Bad implementations<\/h3>\n<ul>\n<li>The old muxers sometimes do not even use the now-available internals (e.g. the interleaver helpers) but implement internally queues and logic that should be now common and shared across all the muxers.<\/li>\n<li>While AVCodec has (now) quite an uniform mean to slice bytes and bits, avformat is not leveraging it beside few places.<\/li>\n<\/ul>\n<blockquote><p>\n<strong>PS<\/strong>: <a href=\"http:\/\/codecs.multimedia.cx\/\">Kostya<\/a> <a href=\"http:\/\/codecs.multimedia.cx\/?p=971\">prefers<\/a> to provide both amorphous stream and chopped packets. It makes sense since you might have some codec you cannot parse but you can sort of safely remux if the container is the same.<br \/>\nFor the common case I&#8217;d rather suggest to use a set of functions that always insert parsers when they can both demuxing and muxing and provide another set of functions to get arbitrary lumps of stream as provided by the container format.\n<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Container formats should be just a boring application of serialization of multiple arrays of tuples timestamp-binary blob. Instead there are tons of implementation details and there are fun and exceedingly annoying means to lose your sanity. This post is yet another post about APIs you can see other here and here. Current Status In Libav &hellip; <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/06\/06\/rethinking-avformat-part-1\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Rethinking AVFormat &#8211; part 1<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[14,1],"tags":[19],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1aGWH-7z","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/469"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/comments?post=469"}],"version-history":[{"count":7,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/469\/revisions"}],"predecessor-version":[{"id":498,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/469\/revisions\/498"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/media?parent=469"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/categories?post=469"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/tags?post=469"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}