{"id":277,"date":"2014-03-05T23:01:02","date_gmt":"2014-03-05T22:01:02","guid":{"rendered":"https:\/\/blogs.gentoo.org\/mgorny\/?p=277"},"modified":"2014-03-05T23:09:50","modified_gmt":"2014-03-05T22:09:50","slug":"reducing-squashfs-delta-size-through-partial-decompression","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/mgorny\/2014\/03\/05\/reducing-squashfs-delta-size-through-partial-decompression\/","title":{"rendered":"Reducing SquashFS delta size through partial decompression"},"content":{"rendered":"<p>In a\u00a0previous article titled \u2018using deltas to speed up SquashFS ebuild repository updates\u2019, the\u00a0author has considered benefits of\u00a0using binary deltas to update SquashFS images. The\u00a0proposed method has proven very efficient in\u00a0terms of\u00a0disk I\/O, memory and\u00a0CPU time use. However, the\u00a0relatively large size of\u00a0deltas made network bandwidth a\u00a0bottleneck.<\/p>\n<p>The\u00a0rough estimations done at the\u00a0time proved that this is not a\u00a0major issue for a\u00a0common client with a\u00a0moderate-bandwidth link such as\u00a0ADSL.  Nevertheless, the\u00a0size is an\u00a0inconvenience both to clients and\u00a0to mirror providers. Assuming that there is an\u00a0upper bound on disk space consumed by\u00a0snapshots, the\u00a0extra size reduces the\u00a0number of\u00a0snapshots stored on\u00a0mirrors, and\u00a0therefore shortens the\u00a0supported update period.<\/p>\n<p>The\u00a0most likely cause for the\u00a0excessive delta size is the\u00a0complexity of\u00a0correlation between input and\u00a0compressed output.  Changes in\u00a0input files are likely to cause much larger changes in\u00a0the\u00a0SquashFS output that the\u00a0tested delta algorithms fail to express efficiently.<\/p>\n<p>For\u00a0example, in\u00a0the\u00a0LZ family of\u00a0compression algorithms, a\u00a0change in\u00a0input stream may affect the\u00a0contents of the\u00a0dictionary and\u00a0therefore the\u00a0output stream following it. In\u00a0block-based compressors such as\u00a0bzip2, a\u00a0change in\u00a0input may shift all the\u00a0following data moving it across block boundaries. As a\u00a0result, the\u00a0contents of\u00a0all the\u00a0blocks following it change, and\u00a0therefore the\u00a0compressed output for\u00a0each of\u00a0them.<\/p>\n<p>Since SquashFS splits the\u00a0input into multiple blocks that are compressed separately, the\u00a0scope of\u00a0this issue is much smaller than in\u00a0plain tarballs. Nevertheless, small changes occurring in\u00a0multiple blocks are able to grow delta two to\u00a0four times as\u00a0large as it would be if\u00a0the\u00a0data was not compressed. In this paper, the\u00a0author explores the\u00a0possibility of\u00a0introducing a\u00a0transparent decompression in the\u00a0delta generation process to\u00a0reduce the\u00a0delta size.<\/p>\n<p><a rel='external' href='http:\/\/dev.gentoo.org\/~mgorny\/articles\/reducing-squashfs-delta-size-through-partial-decompression.pdf'>Read on\u2026 [PDF]<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a\u00a0previous article titled \u2018using deltas to speed up SquashFS ebuild repository updates\u2019, the\u00a0author has considered benefits of\u00a0using binary deltas to update SquashFS images. The\u00a0proposed method has proven very efficient in\u00a0terms of\u00a0disk I\/O, memory and\u00a0CPU time use. However, the\u00a0relatively large size of\u00a0deltas made network bandwidth a\u00a0bottleneck. The\u00a0rough estimations done at the\u00a0time proved that this is &hellip; <a href=\"https:\/\/blogs.gentoo.org\/mgorny\/2014\/03\/05\/reducing-squashfs-delta-size-through-partial-decompression\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Reducing SquashFS delta size through partial decompression&#8221;<\/span><\/a><\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/277"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/comments?post=277"}],"version-history":[{"count":2,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/277\/revisions"}],"predecessor-version":[{"id":279,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/277\/revisions\/279"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/media?parent=277"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/categories?post=277"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/tags?post=277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}