{"id":332,"date":"2014-10-11T14:47:41","date_gmt":"2014-10-11T14:47:41","guid":{"rendered":"http:\/\/blogs.gentoo.org\/lu_zero\/?p=332"},"modified":"2014-10-18T08:42:10","modified_gmt":"2014-10-18T08:42:10","slug":"vdd14-discussions-hwaccel2","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/lu_zero\/2014\/10\/11\/vdd14-discussions-hwaccel2\/","title":{"rendered":"VDD14 Discussions: HWAccel2"},"content":{"rendered":"<div class=\"alert alert-success\">\nI took part to the <b>Videolan Dev Days 14<\/b> weeks ago, sadly I had been too busy so the posts about it will appear in scattered order and sort of delayed.\n<\/div>\n<h2>Hardware acceleration<\/h2>\n<p>In multimedia, video is basically crunching numbers and get pixels or crunching pixels and getting numbers. Most of the operation are quite time consuming on a general purpose CPU and orders of magnitude faster if done using DSP or hardware designed for that purpose.<\/p>\n<h3>Availability<\/h3>\n<p>Most of the commonly used system have video decoding and encoding capabilities either embedded in the GPU or in separated hardware. Leveraging it spares lots of <strong>cpu cycles<\/strong> and lots of <strong>battery<\/strong> if we are thinking about mobile.<\/p>\n<h3>Capabilities<\/h3>\n<p>The usually specialized hardware has the issue of being <strong>inflexible<\/strong> and that does clash with the fact most codec evolve quite quickly with additional <strong>profiles<\/strong> to extend its capabilities, support different color spaces, use additional encoding strategies and such. <strong>Software<\/strong> decoders and encoders are still needed and need badly.<\/p>\n<h2>Hardware acceleration support in Libav<\/h2>\n<h3>HWAccel 1<\/h3>\n<p>The hardware acceleration support in Libav grew (like other eldritch-<strong>horror<\/strong> tentacular code we have lurking from our dark past) without much direction addressing short term problems and not really documenting how to use it.<\/p>\n<p>As result all the people that <strong>dared<\/strong> to use it had to guess, usually used internal symbols that they wouldn&#8217;t have to use and all in all had to spend lots of time and<br \/>\nhad enough grief when such internals changed.<\/p>\n<h4>Usage<\/h4>\n<p>Every backend required a quite large deal of boilerplate code to initialize the backend-specific context and to render the hardware surface wrapped in the AVFrame.<\/p>\n<p>The Libav backend interface was quite vague in itself, requiring to override <code>get_format<\/code> and <code>get_buffer<\/code> in some ways.<\/p>\n<p>Overall to get the whole thing working the library user was supposed to do about 75% of the work. Not really nice considering people uses libraries to <strong>abstract<\/strong> complexity and avoid <strong>repetition<\/strong>&#8230;<\/p>\n<h4>Backend support<\/h4>\n<p>As that support was written with just slice-based decoder in mind,  it expects that all the backend would require the software decoder to parse the bitstream, prepare <strong>slices<\/strong> of the frame and feed the backend with them.<\/p>\n<p>Sadly new backends appeared and they take directly either bitstream or full frames, the approach had been just to take the slice, add back the bitstream markers the backend library expects and be done with that.<\/p>\n<h3>Initial HWAccel 2 discussion<\/h3>\n<p>Last year since the number of backends I wanted to support were all bitstream-oriented and not fitting the mode at all I started thinking about it and the topic got discussed a bit during <strong>VDD 13<\/strong>. Some people that spent their dear time getting <strong>hwaccel1<\/strong> working with their software were quite wary of radical changes so a path of incremental improvements got more or less put down.<\/p>\n<h4>HWAccel 1.2<\/h4>\n<ul>\n<li>default functions to allocate and free the backend context and make the struct to interface between Libav and the backend extensible without causing <a href=\"https:\/\/wiki.libav.org\/CompatibilityBreak\">breakage<\/a>.<\/li>\n<li><strong>avconv<\/strong> now can use some hwaccel, providing at least an example on how to use them and a mean to test without having to gut <strong>VLC<\/strong> or <strong>mpv<\/strong> to experiment.<\/li>\n<li>document better the old-style hwaccels so at least some mistakes could be avoided (and some code that happen to work by sheer look won&#8217;t break once the faulty assuptions cease to exist)<\/li>\n<\/ul>\n<p>The new VDA backend and the update VDPAU backend are examples of it.<\/p>\n<h4>HWAccel 1.3<\/h4>\n<ul>\n<li>extend the callback system to fit decently bitstream oriented backends.<\/li>\n<li>provide an example of backend directly providing normal AVFrames.<\/li>\n<\/ul>\n<p>The Intel QSV backend is used as a testbed for hwaccel 1.3.<\/p>\n<h2>The future of HWAccel2<\/h2>\n<p>Another year, another meeting. We sat down again to figure out how to get further closer to the end result of not having the casual users write boilerplate code to use hwaccel to get at least some performance boost and yet let the power users have the full access to the underpinnings so they can get most of it without having to write everything from scratch.<\/p>\n<h3>Simplified usage, hopefully really simple<\/h3>\n<p>The user just needs to use <strong>AVOption<\/strong> to set specific keys such as <strong>hwaccel<\/strong> and optionally <strong>hwaccel-device<\/strong> and the library will take care of everything. The frames returned by <code>avcodec_decode_video2<\/code> will contain normal system memory and commonly used pixel formats. No further special code will be needed.<\/p>\n<h3>Advanced usage, now properly abstracted<\/h3>\n<p>All the default initialization, memory\/surface allocation and such will remain overridable, with the difference that an additional callback called <code>get_hw_surface<\/code> will be introduced to separate completely the hwaccel path from the software path and specific functions to hand over the ownership of backend contexts and surfaces will be provided.<\/p>\n<p>The software <strong>fallback<\/strong> won&#8217;t be anymore automagic in this case, but a specific <code>AVERROR_INPUT_CHANGED<\/code> will be returned so would be cleaner for the user reset the decoder without losing the display that maybe was sharing the same context. This leads the way to a simpler mean to support multiple hwaccel backends and fall back from one to the other to eventually the software decoding.<\/p>\n<h2>Migration path<\/h2>\n<blockquote><p>\n  We try our <a href=\"https:\/\/wiki.libav.org\/Migration\">best<\/a> to help people move to the new APIs.\n<\/p><\/blockquote>\n<p>Moving from <strong>HWAccel1<\/strong> to <strong>HWAccel2<\/strong> in general would result in less lines of code in the application, the people wanting to keep their callback need to just set them after <code>avcodec_open2<\/code> and move the pixel specific <code>get_buffer<\/code> to <code>get_hw_surface<\/code>. The presence of <code>av_hwaccel_hand_over_frame<\/code> and <code>av_hwaccel_hand_over_context<\/code> will make much simpler managing the backend specific resources.<\/p>\n<h2>Expected Time of Arrival<\/h2>\n<p>Right now the review is on the <strong>HWaccel1.3<\/strong>, I hope to complete this step and add few new backends to test how good\/bad that API is before adding the other steps. Probably HWAccel2 will take at least other 6 months.<\/p>\n<p>Help in form of code or just moral support is always welcome!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I took part to the Videolan Dev Days 14 weeks ago, sadly I had been too busy so the posts about it will appear in scattered order and sort of delayed. Hardware acceleration In multimedia, video is basically crunching numbers and get pixels or crunching pixels and getting numbers. Most of the operation are quite &hellip; <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2014\/10\/11\/vdd14-discussions-hwaccel2\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">VDD14 Discussions: HWAccel2<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[14],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1aGWH-5m","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/332"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/comments?post=332"}],"version-history":[{"count":5,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/332\/revisions"}],"predecessor-version":[{"id":338,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/332\/revisions\/338"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/media?parent=332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/categories?post=332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/tags?post=332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}