{"id":665,"date":"2018-06-14T18:43:28","date_gmt":"2018-06-14T18:43:28","guid":{"rendered":"http:\/\/blogs.gentoo.org\/lu_zero\/?p=665"},"modified":"2018-06-14T18:43:28","modified_gmt":"2018-06-14T18:43:28","slug":"video-compression-bounty-hunters","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/lu_zero\/2018\/06\/14\/video-compression-bounty-hunters\/","title":{"rendered":"Video Compression Bounty Hunters"},"content":{"rendered":"<p>In this post, we (Luca Barbato and <a href=\"https:\/\/medium.com\/@luc.trudeau\">Luc Trudeau<\/a>) joined forces to talk about the awesome work we\u2019ve been doing on <a href=\"https:\/\/en.wikipedia.org\/wiki\/AltiVec\">Altivec\/VSX<\/a> optimizations for the <a href=\"https:\/\/www.webmproject.org\/code\/\">libvpx<\/a> library, you can read it here or on <a href=\"https:\/\/medium.com\/@luc.trudeau\">Luc&#8217;s medium<\/a>.<\/p>\n<p>Both of us where in Brussels for <a href=\"https:\/\/fosdem.org\/2018\/\">FOSDEM 2018<\/a>, Luca <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2018\/02\/14\/rust-av-rust-and-multimedia\/\">presented his work on rust-av<\/a> and Luc was there to <a href=\"https:\/\/twitter.com\/trudluc\/status\/960531177479254016\">hack on rav1e<\/a> \u2013 a<a href=\"https:\/\/github.com\/xiph\/rav1e\">n experimental AV1 video encoder in Rust<\/a>.<\/p>\n<p>Luca joined the rav1e team and helped give hints about how to effectively leverage rust. Together, we worked on <a href=\"https:\/\/github.com\/xiph\/rav1e\/blob\/master\/src\/predict.rs\">AV1 intra prediction code<\/a>, among the other things.<\/p>\n<p><b>Luc Trudeau<\/b>: I was finishing up my work on <a href=\"https:\/\/vimeo.com\/269083327\">Chroma from Luma in AV1<\/a>, and wanted to stay involved in royalty free open source video codecs. When Luca talked to me about <a href=\"https:\/\/www.bountysource.com\/trackers\/46650841-lu-zero-libvpx\">libvpx bounties on Bountysource<\/a>, I was immediately intrigued.<\/p>\n<p><b>Luca Barbato<\/b>: Luc just <a href=\"https:\/\/twitter.com\/trudluc\/status\/979349356544278530\">finished implementing the Neon version of his CfL<\/a> work and I wondered how that code could work using VSX. I prepared some of the machinery that was missing in libaom and <a href=\"https:\/\/twitter.com\/trudluc\/status\/981736758630109184\">Luc tried his hand on Altivec<\/a>. We still had some pending libvpx work sponsored by IBM and I asked him if he wanted to join in.<\/p>\n<h1>What\u2019s libvpx?<\/h1>\n<p>For those less familiar, libvpx is the official Google implementation of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/VP9\">VP9<\/a> video format. VP9 is most notably used in <a href=\"https:\/\/youtube-eng.googleblog.com\/2015\/04\/vp9-faster-better-buffer-free-youtube.html\">YouTube<\/a> and <a href=\"https:\/\/medium.com\/netflix-techblog\/more-efficient-mobile-encodes-for-netflix-downloads-625d7b082909\">Netflix<\/a>. VP9 playback is available on some browsers including <a href=\"https:\/\/caniuse.com\/#search=vp9\">Chrome, Edge and Firefox<\/a> and also on Android devices, covering the 75.31% of the global user base.<\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-668\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb.png\" alt=\"\" width=\"2296\" height=\"518\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb.png 2296w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb-300x68.png 300w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb-768x173.png 768w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/cfaaaf4b-f0f5-480b-b85f-5654258371cb-1024x231.png 1024w\" sizes=\"(max-width: 2296px) 100vw, 2296px\" \/><\/a><\/p>\n<p>Ref: <a href=\"https:\/\/caniuse.com\/#search=vp9\">caniuse.com VP9 support in browsers.<\/a><\/p>\n<h1>Why use VP9, when the de facto video format is H.264\/AVC?<\/h1>\n<p>Because VP9 is <a href=\"https:\/\/www.cnet.com\/news\/google-reaches-deal-with-mpeg-la-over-its-vp8-video-codec\/\">royalty free<\/a> and the <a href=\"https:\/\/medium.com\/netflix-techblog\/more-efficient-mobile-encodes-for-netflix-downloads-625d7b082909\">bandwidth savings are substantial when compared to H.264<\/a> when playback is available (<a href=\"https:\/\/ngcodec.com\/news\/2017\/10\/21\/why-we-are-supporting-vp9-and-av1\">an estimated 3.3B devices support VP9<\/a>). In other words, having VP9 as a secondary codec can pay for itself in bandwidth savings by not having to send H.264 to most users.<\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/1kB-gfYfn9v34nm11ouXGQg.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-669\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/1kB-gfYfn9v34nm11ouXGQg.png\" alt=\"\" width=\"870\" height=\"588\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/1kB-gfYfn9v34nm11ouXGQg.png 870w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/1kB-gfYfn9v34nm11ouXGQg-300x203.png 300w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/1kB-gfYfn9v34nm11ouXGQg-768x519.png 768w\" sizes=\"(max-width: 870px) 100vw, 870px\" \/><\/a><\/p>\n<p>Ref: <a href=\"https:\/\/medium.com\/netflix-techblog\/more-efficient-mobile-encodes-for-netflix-downloads-625d7b082909\">Netflix VP9 compression analysis.<\/a><\/p>\n<h1>Why care about libvpx on Power?<\/h1>\n<p>Dynamic adaptive streaming formats like <a href=\"https:\/\/en.wikipedia.org\/wiki\/HTTP_Live_Streaming\">HLS<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dynamic_Adaptive_Streaming_over_HTTP\">MPEG DASH<\/a> have completely changed the game of streaming video over the internet. Streaming hardware and custom multimedia servers are being replaced by web servers.<\/p>\n<p>From the servers\u2019 perspective streaming video is akin to serving small videos files; lots of small video files! To cover all clients and most network conditions a considerable amount of video files must be encoded, stored and distributed.<\/p>\n<p>Things are changing fast and while the total cost of ownership of video content for previous generation video formats, like H.264, was mostly made up of bandwidth and hosting, encoding costs are growing with more complex video formats like <a href=\"https:\/\/en.wikipedia.org\/wiki\/High_Efficiency_Video_Coding\">HEVC<\/a> and VP9.<\/p>\n<p>This complexity is <a href=\"https:\/\/code.facebook.com\/posts\/253852078523394\/av1-beats-x264-and-libvpx-vp9-in-practical-use-case\/\">reported to have grown exponentially with the upcoming AV1 video format<\/a>. A video format, built on the libVPX code base, by the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Alliance_for_Open_Media\">Alliance for Open Media<\/a>, of which <a href=\"https:\/\/aomedia.org\/membership\/members\/\">IBM is a founding member<\/a>.<\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n.jpg\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-670\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n.jpg\" alt=\"\" width=\"1635\" height=\"662\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n.jpg 1635w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n-300x121.jpg 300w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n-768x311.jpg 768w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/30179634_213060259453407_4307818495279628288_n-1024x415.jpg 1024w\" sizes=\"(max-width: 1635px) 100vw, 1635px\" \/><\/a><\/p>\n<p><i>Ref: <a href=\"https:\/\/code.facebook.com\/posts\/253852078523394\/av1-beats-x264-and-libvpx-vp9-in-practical-use-case\/\">Facebook\u2019s AV1 complexity analysis<\/a><\/i><\/p>\n<p>At the same time, IBM and its partners in the<a href=\"https:\/\/en.wikipedia.org\/wiki\/OpenPOWER_Foundation\"> OpenPower Foundation<\/a> are releasing some very impressive hardware with the new <a href=\"https:\/\/en.wikipedia.org\/wiki\/POWER9\">Power9<\/a> processor line up. Big Iron Power9 systems, like the <a href=\"https:\/\/www.phoronix.com\/scan.php?page=news_item&amp;px=Raptor-Talos-2-Lite\">Talos II<\/a> from <a href=\"https:\/\/www.raptorcs.com\/TALOSII\/prerelease.php\">Raptor Computing Systems<\/a> and the collaboration between <a href=\"https:\/\/cloudplatform.googleblog.com\/2016\/10\/introducing-Zaius-Google-and-Rackspaces-open-server-running-IBM-POWER9.html\">Google<\/a> and <a href=\"https:\/\/blog.rackspace.com\/barreleye-g2-zaius-pvt-entry-and-motherboard-at-openpower-opencompute-and-ibm-think\">Rackspace<\/a> on Zaius\/Barreleye servers, are ideal solutions to the tackle the growing complexity of video format encoding.<\/p>\n<p>However, these awesome machines are currently at a disadvantage when encoding video. Without the platform specific optimizations, that their competitors enjoy, the Power9 architecture can\u2019t be fully utilized. This is clearly illustrated in the x264 benchmark released in a <a href=\"https:\/\/www.phoronix.com\/scan.php?page=article&amp;item=power9-epyc-xeon&amp;num=2\">recent Phoronix article<\/a>.<\/p>\n<p><a href=\"http:\/\/openbenchmarking.org\/prospect\/1804049-AR-POWERTALO23\/df04c74f45043edbffc448ab1bab90caabb093af\"><img src=\"\/\/openbenchmarking.org\/embed.php?i=1804049-AR-POWERTALO23&amp;sha=1721bb3&amp;c=df04c74f45043edbffc448ab1bab90caabb093af&amp;p=1\" \/><\/a><\/p>\n<p>Ref: <a href=\"https:\/\/www.phoronix.com\/scan.php?page=article&amp;item=power9-epyc-xeon&amp;num=2\">Phoronix x264 server benchmark.<\/a><\/p>\n<p>Thanks to the optimization bounties sponsored by IBM, we are hard at work bridging the gap in libvpx.<\/p>\n<h1>Optimization bounties?<\/h1>\n<p>Just like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bug_bounty_program\">bug bounty programs<\/a>, optimization make for great bounties. Companies that see benefit in platform specific optimizations for video codecs can sponsor our bounties on the <a href=\"https:\/\/www.bountysource.com\/trackers\/46650841-lu-zero-libvpx\">Bountysource platform<\/a>.<\/p>\n<p>Multiple companies can sponsor the same bounty, thus sharing cost of more important bounties. Furthermore, bounties are a minimal risk investment for sponsors, as they are only paid out when the work is completed (and peer reviewed by libvpx maintainers)<\/p>\n<p>Not only is the Bountysource platform a win for companies that directly benefit from the bounties they are sponsoring, it\u2019s also a win for developers (like us) who can get paid to work on free and open source projects that we are passionate about. Optimization bounties are a source of sustainability in the free and open source software ecosystem.<\/p>\n<h1>How do you choose bounties?<\/h1>\n<p>Since we\u2019re a small team of bounty hunters (<a href=\"https:\/\/github.com\/lu-zero\">Luca Barbato<\/a>, <a href=\"https:\/\/github.com\/sasshka\">Alexandra H\u00e1jkov\u00e1<\/a>, <a href=\"https:\/\/github.com\/rafaeldelucena\">Rafael de Lucena Valle<\/a> and <a href=\"https:\/\/github.com\/luctrudeau\">Luc Trudeau<\/a>), we need to play it smart and maximize the impact of our work. We\u2019ve identified two common use cases related to streaming on the Power architecture: YouTube-like encodes and real time (a.k.a. low latency) encodes.<\/p>\n<p>By profiling libvpx under these conditions, we can determine the key functions to optimize. The following charts show the percentage of time spent the in top 20 functions of the libvpx encoder (without Altivec\/VSX optimisations) on a Power8 system, for both YouTube-like and real time settings.<\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-671\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1.png\" alt=\"\" width=\"924\" height=\"1036\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1.png 924w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1-268x300.png 268w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1-768x861.png 768w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-1-913x1024.png 913w\" sizes=\"(max-width: 924px) 100vw, 924px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-672\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2.png\" alt=\"\" width=\"1077\" height=\"1019\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2.png 1077w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2-300x284.png 300w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2-768x727.png 768w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/chart-2-1024x969.png 1024w\" sizes=\"(max-width: 1077px) 100vw, 1077px\" \/><\/a><\/p>\n<p>It\u2019s interesting to see that the top 20 functions make up about 80% of the encoding time. That\u2019s similar in spirit to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pareto_principle\">Pareto principle<\/a>, in that we don\u2019t have to optimize the whole encoder to make the Power architecture competitive for video encoding.<\/p>\n<p>We see a similar distribution between YouTube-like encoding settings and real time video encoding. In other words, optimization bounties for libvpx benefit both Video on Demand (VOD) and live broadcast services.<\/p>\n<p>We add <a href=\"https:\/\/www.bountysource.com\/trackers\/46650841-lu-zero-libvpx\">bounties on the Bountysource platform<\/a> around common themed functions like: convolution, sum of absolute differences (SAD), variance, etc. Companies interested in libvpx optimization can go and fund these bounties.<\/p>\n<h1>What\u2019s the impact of this project so far?<\/h1>\n<p>So far, we delivered multiple libvpx bounties including:<\/p>\n<ul>\n<li>Convolution<\/li>\n<li>Sum of absolute differences (SAD)<\/li>\n<li>Quantization<\/li>\n<li>Inverse transforms<\/li>\n<li>Intra prediction<\/li>\n<li>etc.<\/li>\n<\/ul>\n<p>To see the benefit of our work, we compiled the latest version of libVPX with and without VSX optimizations and ran it on a Power8 machine. Note that the C compiled versions can produce Altivec\/VSX code via <a href=\"https:\/\/en.wikipedia.org\/wiki\/Automatic_vectorization\">auto vectorization<\/a>. The results, in frames per minutes, are shown below for both YouTube-like encoding and Real time encoding.<\/p>\n<p><a href=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/encoding-time.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-673\" src=\"http:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/encoding-time.png\" alt=\"\" width=\"1022\" height=\"852\" srcset=\"https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/encoding-time.png 1022w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/encoding-time-300x250.png 300w, https:\/\/blogs.gentoo.org\/lu_zero\/files\/2018\/06\/encoding-time-768x640.png 768w\" sizes=\"(max-width: 1022px) 100vw, 1022px\" \/><\/a><\/p>\n<p>Our current VSX optimizations give approximately a 40% and 30% boost in encoding speed for YouTube-like and real time encoding respectively. Encoding speed increases in the range of 10 to 14 frames per minute can considerably reduce cloud encoding costs for Power architecture users.<\/p>\n<p>In the context of real time encoding, the time saved by the platform optimization can be put to good use to improve compression efficiency. Concretely, a real time encoder will encode in real time speed, but speeding up the encoders allows for operators to increase the number of <a href=\"https:\/\/developers.google.com\/media\/vp9\/settings\/vod\/\">coding tools<\/a>, resulting in better quality for the viewers and bandwidth savings for operators.<\/p>\n<h1>What\u2019s next?<\/h1>\n<p>We\u2019re energized by the impact that our small team of bounty hunters is having on libvpx performance for the Power architecture and we wanted to share it in this blog post. We look forward to getting even more performance from libvpx on the Power architecture. Expect considerable performance improvement for the Power architecture in the next libvpx release (1.8).<\/p>\n<p>As IBM targets its Power9 line of systems at heavy cloud computations, it seems natural to also aim all that power at tackling the growing costs of AV1 encodes. This won\u2019t happen without platform specific optimizations and the time to start is now; as the AV1 format is being finalized, everyone is still in the early phases of optimization. We are currently working with our sponsors to set up AV1 bounties, so stay tuned for an upcoming post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we (Luca Barbato and Luc Trudeau) joined forces to talk about the awesome work we\u2019ve been doing on Altivec\/VSX optimizations for the libvpx library, you can read it here or on Luc&#8217;s medium. Both of us where in Brussels for FOSDEM 2018, Luca presented his work on rust-av and Luc was there &hellip; <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2018\/06\/14\/video-compression-bounty-hunters\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Video Compression Bounty Hunters<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[6],"tags":[30,28,27,29],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1aGWH-aJ","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/665"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/comments?post=665"}],"version-history":[{"count":4,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/665\/revisions"}],"predecessor-version":[{"id":675,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/665\/revisions\/675"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/media?parent=665"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/categories?post=665"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/tags?post=665"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}