{"id":454,"date":"2015-05-02T20:54:30","date_gmt":"2015-05-02T20:54:30","guid":{"rendered":"http:\/\/blogs.gentoo.org\/lu_zero\/?p=454"},"modified":"2015-06-05T11:02:15","modified_gmt":"2015-06-05T11:02:15","slug":"splitting-a-library-hashes","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/05\/02\/splitting-a-library-hashes\/","title":{"rendered":"Splitting a library &#8211; hashes"},"content":{"rendered":"<p><a href=\"https:\/\/libav.org\/doxygen\/release\/11\/group__lavu.html\">libavutil<\/a> contains lots that is common with the other libraries that compose <a href=\"https:\/\/libav.org\">Libav<\/a>. It grown a lot over the years and it&#8217;s time to consider splitting it.<\/p>\n<h2>Monolithic vs Modular<\/h2>\n<p>There will always be some discussion on which approach is globally better.<br \/>\n&#8211; Jumbling everything together so you have everything there and doesn&#8217;t matter what, you have your super <em>hammer<\/em> supporting screws, bolts, nuts and nails.<br \/>\n&#8211; Keeping the tools in separate boxes so you carry only the set of spanners you need when you need it.<\/p>\n<p>For software libraries you have this kind of problem all the time and at multiple levels:<br \/>\n&#8211; Do you want to have a single huge <code>header file<\/code> with every function your library provides or a set of them organized to keep all the function related together?<br \/>\n&#8211; Do you want to link a single library or have the concerns split in multiple so you do not have to carry lots of stuff you do not use (storage and memory are still important in some applications).<\/p>\n<p>Usually modularity comes with the price of additional initial effort (you have to <strong>think<\/strong> about what you are going to use a little harder) and maintenance (which library should I <strong>update<\/strong>?).<\/p>\n<p>This blogpost is about trying to group and split bunch of unrelated functions present in a library and try to get a better API for some of them.<\/p>\n<h2>Libavutil<\/h2>\n<p>The Libav libraries are written mainly in languages (C, asm) and they focus a lot on being portable. Libavutil is the foundation.<\/p>\n<p>It contains all the code that is <strong>common<\/strong> across libraries from the basics such as <a href=\"https:\/\/libav.org\/doxygen\/release\/11\/group__lavu__mem.html\">memory management<\/a> to higher level <a href=\"https:\/\/libav.org\/doxygen\/release\/11\/group__lavu__data.html\">data structures<\/a>, to video and <a href=\"https:\/\/libav.org\/doxygen\/release\/11\/group__lavu__audio.html\">audio-specific<\/a> basic manipulation and <a href=\"https:\/\/libav.org\/doxygen\/release\/11\/group__lavu__crypto.html\">hashes, cryptographic primitives and lossless compressors<\/a>.<\/p>\n<p><strong>A lot indeed<\/strong>.<\/p>\n<h2>Problems<\/h2>\n<h3>Irregular Mushroom-API<\/h3>\n<p>Some of the highest level part of the library appeared little by little, first you need <code>md5<\/code> and you add it, then is <code>aes<\/code>, then you want <code>lzo<\/code>. All the crypto expose direct functions to that specific hash, making those components non-optional even if you do not need them.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"cp\"># libavutil\/aes.h<\/span>\n<span class=\"k\">struct<\/span> <span class=\"n\">AVAES<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"n\">AVAES<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_aes_alloc<\/span><span class=\"p\">(<\/span><span class=\"kt\">void<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_aes_init<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVAES<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">key<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">key_bits<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">decrypt<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_aes_crypt<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVAES<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">dst<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">count<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">iv<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">decrypt<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\"># libavutil\/xtea.h<\/span>\n<span class=\"k\">typedef<\/span> <span class=\"k\">struct<\/span> <span class=\"n\">AVXTEA<\/span> <span class=\"p\">{<\/span>\n    <span class=\"kt\">uint32_t<\/span> <span class=\"n\">key<\/span><span class=\"p\">[<\/span><span class=\"mi\">16<\/span><span class=\"p\">];<\/span>\n<span class=\"p\">}<\/span> <span class=\"n\">AVXTEA<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_xtea_init<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVXTEA<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"n\">key<\/span><span class=\"p\">[<\/span><span class=\"mi\">16<\/span><span class=\"p\">]);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_xtea_crypt<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVXTEA<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">dst<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span>\n                   <span class=\"kt\">int<\/span> <span class=\"n\">count<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">iv<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">decrypt<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\"># libavutil\/sha.h<\/span>\n<span class=\"k\">struct<\/span> <span class=\"n\">AVSHA<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"n\">AVSHA<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_sha_alloc<\/span><span class=\"p\">(<\/span><span class=\"kt\">void<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_sha_init<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVSHA<\/span><span class=\"o\">*<\/span> <span class=\"n\">context<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">bits<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_sha_update<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVSHA<\/span><span class=\"o\">*<\/span> <span class=\"n\">context<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span><span class=\"o\">*<\/span> <span class=\"n\">data<\/span><span class=\"p\">,<\/span> <span class=\"kt\">unsigned<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">len<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_sha_final<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVSHA<\/span><span class=\"o\">*<\/span> <span class=\"n\">context<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">digest<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\"># libavutil\/md5.h<\/span>\n<span class=\"k\">struct<\/span> <span class=\"n\">AVMD5<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"n\">AVMD5<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_md5_alloc<\/span><span class=\"p\">(<\/span><span class=\"kt\">void<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_md5_init<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVMD5<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_md5_update<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVMD5<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">len<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_md5_final<\/span><span class=\"p\">(<\/span><span class=\"k\">struct<\/span> <span class=\"n\">AVMD5<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">dst<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">av_md5_sum<\/span><span class=\"p\">(<\/span><span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">dst<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">len<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>As you might notice it got to have lots and lots of expose, similar-but-non-uniform API popping out.<\/p>\n<p>And if it was acceptable having a couple of hashes always around it gets not so nice if you have more to add.<\/p>\n<p>Right now <code>libavutil<\/code> exposes 50 separate headers.<\/p>\n<h3>Extending it is painful now<\/h3>\n<p>Since we already have that many different components inside it you think twice about adding <em>more<\/em> stuff (if you are careful and caring), Libav is fairly modular and people do appreciate that.<\/p>\n<p>In my wishlist I have few items such as getting more decompressors natively implemented.<\/p>\n<p>Every new API is a burden to maintain (if you care about legacy and you keep maintaining releasing your older software) so adding or exposing more is always something you should consider.<\/p>\n<div class=\"alert alert-success\">\n<b>Abstracting<\/b> some details always helps, think what would be the API if each of the supported codecs has an exposed, <b>non-uniform<\/b> set of functions to decode each?\n<\/div>\n<h2>Ideal structure<\/h2>\n<p>Ideally I&#8217;d have the following layout:<br \/>\n&#8211; libavutil:  basic memory abstraction, error, logs and not much else<br \/>\n&#8211; libavdata: basic data structures, including refcounted buffers, dictionaries, trees and such<br \/>\n&#8211; libavmedia: audio samples, pixel formats, metadata, frames, packets, side data types.<br \/>\n&#8211; libavhash: hashes such md5, sha and such<br \/>\n&#8211; libavcomp: compressors such as lzo<br \/>\n&#8211; libavcrypto: aes, blowfish and such<\/p>\n<h3>API<\/h3>\n<p>I already <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/03\/23\/decoupling-an-api\/\">described<\/a> my ideal api for the <strong>codecs<\/strong>, today I&#8217;d detail the <strong>hashes<\/strong><\/p>\n<p>As seen above it is common to have <code>init<\/code>, <code>update<\/code><br \/>\n<code>final<\/code> and an optional utility function <code>sum<\/code> (or <code>calc<\/code>) that takes whole buffer buffer and returns the hash.<\/p>\n<div class=\"codehilite\">\n<pre><span class=\"k\">typedef<\/span> <span class=\"k\">struct<\/span> <span class=\"n\">AVHashLibrary<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">typedef<\/span> <span class=\"k\">struct<\/span> <span class=\"n\">AVHash<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">typedef<\/span> <span class=\"k\">struct<\/span> <span class=\"n\">AVHashContext<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_hash_register_all<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashLibrary<\/span> <span class=\"o\">*<\/span><span class=\"n\">hashes<\/span><span class=\"p\">)<\/span>\n\n<span class=\"k\">const<\/span> <span class=\"n\">AVHash<\/span> <span class=\"o\">*<\/span><span class=\"n\">av_hash_get<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashLibrary<\/span> <span class=\"o\">*<\/span><span class=\"n\">hashes<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"o\">*<\/span><span class=\"n\">name<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">AVHashContext<\/span> <span class=\"o\">*<\/span><span class=\"nf\">avhash_open<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHash<\/span> <span class=\"o\">*<\/span><span class=\"n\">hash<\/span><span class=\"p\">,<\/span> <span class=\"n\">AVDictionary<\/span> <span class=\"o\">*<\/span><span class=\"n\">opts<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">av_hash_update<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">len<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_hash_final<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"o\">*<\/span><span class=\"n\">len<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"nf\">av_hash_sum<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">ctx<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint8_t<\/span> <span class=\"o\">*<\/span><span class=\"n\">src<\/span><span class=\"p\">,<\/span> <span class=\"k\">const<\/span> <span class=\"kt\">uint64_t<\/span> <span class=\"n\">src_len<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"o\">*<\/span><span class=\"n\">out_len<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">avhash_close<\/span><span class=\"p\">(<\/span><span class=\"n\">AVHashContext<\/span> <span class=\"o\">*<\/span><span class=\"n\">hash<\/span><span class=\"p\">);<\/span>\n<\/pre>\n<\/div>\n<p>The structures are fully opaque, the <code>AVHashLibrary<\/code> contains the list of available hashes and possibly some additional hidden state. In Libav we are trying to remove all the global variables so the list of hashes is explicit.<\/p>\n<p>The <code>register_all<\/code> function just populates the list of hashes and possibly creates accessory lookup tables when needed.<\/p>\n<p>The <code>get<\/code> call let you look up the hash by name, additional can be made to look it up by id.<\/p>\n<p>The <code>open<\/code> function takes a dictionary for hash-specific configuration.<\/p>\n<p>The <code>update<\/code> and <code>final<\/code> function let you calculate the hash incrementally, the <code>sum<\/code> function is a simple utility that takes a full buffer (assumed to fit an <code>uint64_t<\/code>) and produces the hash.<\/p>\n<p>The <a href=\"http:\/\/codecs.multimedia.cx\/?cat=25\">NihAV<\/a> from Kostya hopefully will have a similar API with <code>TypeLibrary<\/code>, <code>Type<\/code> and <code>TypeContext<\/code> structs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>libavutil contains lots that is common with the other libraries that compose Libav. It grown a lot over the years and it&#8217;s time to consider splitting it. Monolithic vs Modular There will always be some discussion on which approach is globally better. &#8211; Jumbling everything together so you have everything there and doesn&#8217;t matter what, &hellip; <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2015\/05\/02\/splitting-a-library-hashes\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Splitting a library &#8211; hashes<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[14,6],"tags":[19],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1aGWH-7k","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/454"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":5,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/454\/revisions"}],"predecessor-version":[{"id":466,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/454\/revisions\/466"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/media?parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/categories?post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/tags?post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}