{"id":636,"date":"2017-08-12T19:16:42","date_gmt":"2017-08-12T19:16:42","guid":{"rendered":"http:\/\/blogs.gentoo.org\/lu_zero\/?p=636"},"modified":"2017-08-12T19:16:43","modified_gmt":"2017-08-12T19:16:43","slug":"optimizing-rust","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/lu_zero\/2017\/08\/12\/optimizing-rust\/","title":{"rendered":"Optimizing rust"},"content":{"rendered":"<p>After the post about <a href=\"https:\/\/codecs.multimedia.cx\/2017\/08\/rust-optimising-decoder-experience\/\">optimization<\/a>, Kostya and many commenters (me included) discussed a bit about if there are better ways to optimize that loop without using unsafe code.<\/p>\n<p>Kostya provided me with a test function and multiple implementations from him and I polished and benchmarked the whole thing.<\/p>\n<h2>The code<\/h2>\n<p>I put the code in a simple <a href=\"https:\/\/github.com\/lu-zero\/rust-optimization-example\">project<\/a>, initially it was a simple <code>main.rs<\/code> and then it grew a little.<\/p>\n<p>All it started with this function:<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span><span class=\"k\">pub<\/span><span class=\"w\"> <\/span><span class=\"k\">fn<\/span> <span class=\"nf\">recombine_plane_reference<\/span><span class=\"p\">(<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">src<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"kt\">i16<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">sstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dst<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"nc\">mut<\/span><span class=\"w\"> <\/span><span class=\"p\">[<\/span><span class=\"kt\">u8<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">w<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">h<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">idx0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">idx1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">w<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">idx2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">idx3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">idx2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">idx1<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">oidx0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">oidx1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n\n<span class=\"w\">    <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..(<\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..(<\/span><span class=\"n\">w<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">[<\/span><span class=\"n\">idx0<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"p\">];<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">[<\/span><span class=\"n\">idx1<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"p\">];<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">[<\/span><span class=\"n\">idx2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"p\">];<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">[<\/span><span class=\"n\">idx3<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"p\">];<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">p2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">p2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">p3<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">p3<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">s1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">d1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">s1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">d1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst<\/span><span class=\"p\">[<\/span><span class=\"n\">oidx0<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">(<\/span><span class=\"n\">o0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_shr<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst<\/span><span class=\"p\">[<\/span><span class=\"n\">oidx0<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">(<\/span><span class=\"n\">o1<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_shr<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst<\/span><span class=\"p\">[<\/span><span class=\"n\">oidx1<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">]<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">(<\/span><span class=\"n\">o2<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_shr<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst<\/span><span class=\"p\">[<\/span><span class=\"n\">oidx1<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"n\">x<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"w\"> <\/span><span class=\"o\">+<\/span><span class=\"w\"> <\/span><span class=\"mi\">1<\/span><span class=\"p\">]<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">(<\/span><span class=\"n\">o3<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_shr<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">idx0<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">idx1<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">idx2<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">idx3<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">oidx0<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"n\">oidx1<\/span><span class=\"w\"> <\/span><span class=\"o\">+=<\/span><span class=\"w\"> <\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/pre>\n<\/div>\n<h2>Benchmark<\/h2>\n<p>Kostya used <a href=\"https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page\">perf<\/a> to measure the number of samples it takes over a large number of iterations, I wanted to make the benchmark a little more portable so I used the <a href=\"https:\/\/doc.rust-lang.org\/time\/time\/struct.PreciseTime.html\">time::PreciseTime<\/a> Rust provides to measure something a little more coarse, but good enough for our purposes.<\/p>\n<p>We want to see if rewriting the loop using <strong>unsafe<\/strong> pointers or using <strong>high level<\/strong> iterators provides a decent speedup, no need to be overly precise.<\/p>\n<p>NB: I decided to not use the <strong>bencher<\/strong> utility provided with nightly rust to make the code even easier to use.<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span><span class=\"o\">+<\/span><span class=\"k\">fn<\/span> <span class=\"nf\">benchme<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">F<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"kt\">str<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">n<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">f<\/span>: <span class=\"nc\">F<\/span><span class=\"p\">)<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"k\">where<\/span><span class=\"w\"> <\/span><span class=\"n\">F<\/span><span class=\"w\"> <\/span>: <span class=\"nb\">FnMut<\/span><span class=\"p\">()<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">start<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">PreciseTime<\/span>::<span class=\"n\">now<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..<\/span><span class=\"n\">n<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">        <\/span><span class=\"n\">f<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">end<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">PreciseTime<\/span>::<span class=\"n\">now<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"w\">    <\/span><span class=\"n\">println<\/span><span class=\"o\">!<\/span><span class=\"p\">(<\/span><span class=\"s\">&quot;Runtime {} {}&quot;<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">name<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">start<\/span><span class=\"p\">.<\/span><span class=\"n\">to<\/span><span class=\"p\">(<\/span><span class=\"n\">end<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"o\">+<\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/pre>\n<\/div>\n<div class=\"codehilite\">\n<pre><span><\/span># cargo run --release\n<\/pre>\n<\/div>\n<h2>Unsafe code<\/h2>\n<p>Both me and Kostya have a C background so for him (and for me), was sort of natural embracing <code>unsafe {}<\/code> and use the raw pointers like we are used to.<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span><span class=\"k\">pub<\/span><span class=\"w\"> <\/span><span class=\"k\">fn<\/span> <span class=\"nf\">recombine_plane_unsafe<\/span><span class=\"p\">(<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">src<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"kt\">i16<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">sstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dst<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"nc\">mut<\/span><span class=\"w\"> <\/span><span class=\"p\">[<\/span><span class=\"kt\">u8<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">w<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">h<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"k\">unsafe<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hw<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">w<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">band0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">.<\/span><span class=\"n\">as_ptr<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">band1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band0<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">band2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band0<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(((<\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"n\">sstride<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">band3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band2<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">dst0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst<\/span><span class=\"p\">.<\/span><span class=\"n\">as_mut_ptr<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">dst1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst0<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hh<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..<\/span><span class=\"n\">hh<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">b0_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band0<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">b1_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band1<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">b2_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">b3_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band3<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d0_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst0<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d1_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst1<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..<\/span><span class=\"n\">hw<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">b0_ptr<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">b1_ptr<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">b2_ptr<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">b3_ptr<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">p2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">p3<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">p2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">p3<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">s1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"n\">d1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">s1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">o3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">wrapping_sub<\/span><span class=\"p\">(<\/span><span class=\"n\">d1<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"o\">*<\/span><span class=\"n\">d0_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">((<\/span><span class=\"n\">o0<\/span><span class=\"w\"> <\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"o\">*<\/span><span class=\"n\">d0_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">((<\/span><span class=\"n\">o1<\/span><span class=\"w\"> <\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"o\">*<\/span><span class=\"n\">d1_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">((<\/span><span class=\"n\">o2<\/span><span class=\"w\"> <\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"o\">*<\/span><span class=\"n\">d1_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">clip8<\/span><span class=\"p\">((<\/span><span class=\"n\">o3<\/span><span class=\"w\"> <\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">).<\/span><span class=\"n\">wrapping_add<\/span><span class=\"p\">(<\/span><span class=\"mi\">128<\/span><span class=\"p\">));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">b0_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b0_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">b1_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b1_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">b2_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b2_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">b3_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b3_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">d0_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">                <\/span><span class=\"n\">d1_ptr<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d1_ptr<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">band0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band0<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">band1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band1<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">band2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band2<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">band3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">band3<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst0<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">((<\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">dst1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst1<\/span><span class=\"p\">.<\/span><span class=\"n\">offset<\/span><span class=\"p\">((<\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">as<\/span><span class=\"w\"> <\/span><span class=\"kt\">isize<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/pre>\n<\/div>\n<p>The function is faster than baseline:<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span>    Runtime reference   PT1.598052169S\n    Runtime unsafe      PT1.222646190S\n<\/pre>\n<\/div>\n<h3>Explicit upcasts<\/h3>\n<p>Kostya noticed that telling rust to use i32 instead of i16 gave some performance boost.<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span>    Runtime reference       PT1.601846926S\n    Runtime reference 32bit PT1.371876242S\n    Runtime unsafe          PT1.223115917S\n    Runtime unsafe 32bit    PT1.124667021S\n<\/pre>\n<\/div>\n<p>I&#8217;ll keep variants between i16 and i32 to see when it is important and when it is not.<\/p>\n<blockquote><p>\nNote: Making code generic over primitive types is currently pretty painful and hopefully will be fixed in the future.\n<\/p><\/blockquote>\n<h2>High level abstractions<\/h2>\n<p>Most of the comments to Kostya&#8217;s original post were about leveraging the high level abstractions to make the compiler understand the code better.<\/p>\n<h3>Use Iterators<\/h3>\n<p>Rust is able to omit the bound checks if there is a warranty that the code cannot go out of the array boundaries. Using Iterators instead of for loops over an external variables should do the trick.<\/p>\n<h4>Use <code>Chunks<\/code><\/h4>\n<p><code>chunks<\/code> and <code>chunks_mut<\/code> take a slice and provides a nice iterator that gets you at-most-N-sized pieces of the input slice.<\/p>\n<p>Since that the code works by line it is sort of natural to use it.<\/p>\n<h4>Use <code>split_at<\/code><\/h4>\n<p><code>split_at<\/code> and <code>split_at_mut<\/code> get you independent slices, even mutable. The code is writing two lines at time so having the ability to access mutably two regions of the frame is a boon.<\/p>\n<p>The (read-only) input is divided in bands and the output produced is 2 lines at time. <code>split_at<\/code> is much better than using hand-made slicing and<br \/>\n<code>split_at_mut<\/code> is perfect to write at the same time the even and the odd line.<\/p>\n<p>All together<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span><span class=\"k\">pub<\/span><span class=\"w\"> <\/span><span class=\"k\">fn<\/span> <span class=\"nf\">recombine_plane_chunks_32<\/span><span class=\"p\">(<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">src<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"kt\">i16<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">sstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dst<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"nc\">mut<\/span><span class=\"w\"> <\/span><span class=\"p\">[<\/span><span class=\"kt\">u8<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">w<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">h<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hw<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">w<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hh<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">src1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">src2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"n\">hh<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">src1i<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src1<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">src2i<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src2<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">dstch<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks_mut<\/span><span class=\"p\">(<\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..<\/span><span class=\"n\">hh<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src1i<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">s2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src2i<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dstch<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at_mut<\/span><span class=\"p\">(<\/span><span class=\"n\">dstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">b0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">b1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s1<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">b2<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">b3<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s2<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">iter_mut<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"p\">.<\/span><span class=\"n\">iter_mut<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">bi0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b0<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">bi1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b1<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">bi2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b2<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">bi3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b3<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"n\">_<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"mi\">0<\/span><span class=\"p\">..<\/span><span class=\"n\">hw<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">bi0<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">bi1<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p2<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">bi2<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">p3<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">bi3<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">recombine_core_32<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">p0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p2<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p3<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/pre>\n<\/div>\n<p>It is a good improvement over the reference baseline, but still not as fast as unsafe.<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span>    Runtime reference       PT1.621158410S\n    Runtime reference 32bit PT1.467441931S\n    Runtime unsafe          PT1.226046003S\n    Runtime unsafe 32bit    PT1.126615305S\n    Runtime chunks          PT1.349947181S\n    Runtime chunks 32bit    PT1.350027322S\n<\/pre>\n<\/div>\n<h3>Use of <code>zip<\/code> or <code>izip<\/code><\/h3>\n<p>Using next().unwrap() feels clumsy and force the iterator to be explicitly mutable. The loop can be written in a nicer way using the system provided <code>zip<\/code> and the <code>itertools<\/code>-provided <code>izip<\/code>.<\/p>\n<p><code>zip<\/code> works fine for 2 iterators, then you start piling up (so, (many, (tuples, (that, (feels, lisp))))) (or <code>(feels (lisp, '(so, many, tuples)))<\/code> according to a reader). <code>izip<\/code> flattens the result so it is sort of nicers.<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span><span class=\"k\">pub<\/span><span class=\"w\"> <\/span><span class=\"k\">fn<\/span> <span class=\"nf\">recombine_plane_zip_16<\/span><span class=\"p\">(<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">src<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"p\">[<\/span><span class=\"kt\">i16<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">sstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dst<\/span>: <span class=\"kp\">&amp;<\/span><span class=\"nc\">mut<\/span><span class=\"w\"> <\/span><span class=\"p\">[<\/span><span class=\"kt\">u8<\/span><span class=\"p\">],<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">dstride<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">w<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"n\">h<\/span>: <span class=\"kt\">usize<\/span><span class=\"p\">,<\/span><span class=\"w\"><\/span>\n<span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hw<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">w<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">hh<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">h<\/span><span class=\"w\"> <\/span><span class=\"o\">\/<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">;<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">src1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">src2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"n\">hh<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">src1i<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src1<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">src2i<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">src2<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks<\/span><span class=\"p\">(<\/span><span class=\"n\">sstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">dstch<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dst<\/span><span class=\"p\">.<\/span><span class=\"n\">chunks_mut<\/span><span class=\"p\">(<\/span><span class=\"n\">dstride<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"w\"> <\/span><span class=\"mi\">2<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">s1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">s2<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"n\">src1i<\/span><span class=\"p\">.<\/span><span class=\"n\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">src2i<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">dstch<\/span><span class=\"p\">.<\/span><span class=\"n\">next<\/span><span class=\"p\">().<\/span><span class=\"n\">unwrap<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at_mut<\/span><span class=\"p\">(<\/span><span class=\"n\">dstride<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">b0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">b1<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s1<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">b2<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">b3<\/span><span class=\"p\">)<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">s2<\/span><span class=\"p\">.<\/span><span class=\"n\">split_at<\/span><span class=\"p\">(<\/span><span class=\"n\">hw<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di0<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d0<\/span><span class=\"p\">.<\/span><span class=\"n\">iter_mut<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di1<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">d1<\/span><span class=\"p\">.<\/span><span class=\"n\">iter_mut<\/span><span class=\"p\">();<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"kd\">let<\/span><span class=\"w\"> <\/span><span class=\"n\">iterband<\/span><span class=\"w\"> <\/span><span class=\"o\">=<\/span><span class=\"w\"> <\/span><span class=\"n\">b0<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">().<\/span><span class=\"n\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">b1<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">().<\/span><span class=\"n\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">b2<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">().<\/span><span class=\"n\">zip<\/span><span class=\"p\">(<\/span><span class=\"n\">b3<\/span><span class=\"p\">.<\/span><span class=\"n\">iter<\/span><span class=\"p\">())));<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"k\">for<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">p0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">p1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"p\">(<\/span><span class=\"n\">p2<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"n\">p3<\/span><span class=\"p\">)))<\/span><span class=\"w\"> <\/span><span class=\"k\">in<\/span><span class=\"w\"> <\/span><span class=\"n\">iterband<\/span><span class=\"w\"> <\/span><span class=\"p\">{<\/span><span class=\"w\"><\/span>\n<span class=\"w\">            <\/span><span class=\"n\">recombine_core_16<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">p0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p1<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p2<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">*<\/span><span class=\"n\">p3<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di0<\/span><span class=\"p\">,<\/span><span class=\"w\"> <\/span><span class=\"o\">&amp;<\/span><span class=\"k\">mut<\/span><span class=\"w\"> <\/span><span class=\"n\">di1<\/span><span class=\"p\">);<\/span><span class=\"w\"><\/span>\n<span class=\"w\">        <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"w\">    <\/span><span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<span class=\"p\">}<\/span><span class=\"w\"><\/span>\n<\/pre>\n<\/div>\n<p>How they would fare?<\/p>\n<div class=\"codehilite\">\n<pre><span><\/span>    Runtime reference        PT1.614962959S\n    Runtime reference 32bit  PT1.369636641S\n    Runtime unsafe           PT1.223157417S\n    Runtime unsafe 32bit     PT1.125534521S\n    Runtime chunks           PT1.350069795S\n    Runtime chunks 32bit     PT1.381841742S\n    Runtime zip              PT1.249227707S\n    Runtime zip 32bit        PT1.094282423S\n    Runtime izip             PT1.366320546S\n    Runtime izip 32bit       PT1.208708213S\n<\/pre>\n<\/div>\n<p>Pretty well.<\/p>\n<p>Looks like <code>izip<\/code> is a little more wasteful than <code>zip<\/code> currently, so looks like we have a winner \ud83d\ude42<\/p>\n<h2>Conclusions<\/h2>\n<ul>\n<li>Compared to common imperative programming patterns, using the high level abstractions does lead to a nice speedup: use iterators when you can!<\/li>\n<li>Not all the abstractions cost zero, <code>zip<\/code> made the overall code faster while <code>izip<\/code> lead to a speed regression.<\/li>\n<li>Do benchmark your time critical code. <strong>nightly<\/strong> has some facility for it BUT it is not great for micro-benchmarks.<\/li>\n<\/ul>\n<p>Overall I&#8217;m enjoying a lot writing code in Rust.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After the post about optimization, Kostya and many commenters (me included) discussed a bit about if there are better ways to optimize that loop without using unsafe code. Kostya provided me with a test function and multiple implementations from him and I polished and benchmarked the whole thing. The code I put the code in &hellip; <a href=\"https:\/\/blogs.gentoo.org\/lu_zero\/2017\/08\/12\/optimizing-rust\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Optimizing rust<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[24],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1aGWH-ag","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/636"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/comments?post=636"}],"version-history":[{"count":2,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/636\/revisions"}],"predecessor-version":[{"id":639,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/posts\/636\/revisions\/639"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/media?parent=636"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/categories?post=636"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/lu_zero\/wp-json\/wp\/v2\/tags?post=636"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}