Using deltas to speed up SquashFS updates

The ebuild repository format that is used by Gentoo generally fits well in the developer and power user work flow. It has a simple design that makes reading, modifying and adding ebuilds easy. However, the large number of separate small files with many similarities do not make it very space efficient and often impacts performance. The update (rsync) mechanism is relatively slow compared to distributions like Arch Linux, and is only moderately bandwidth efficient.

There were various attempts at solving at least some of those issues. Various filesystems were used in order to reduce the space consumption and improve performance. Delta updates were introduced through the emerge-delta-webrsync tool to save bandwidth. Sadly, those solutions usually introduce other inconveniences.

Using a separate filesystem for the repositories involves additional maintenance. Using a read-only filesystem makes updates time-consuming. Similarly, the delta update mechanism — while saving bandwidth — usually takes more time than plain rsync update.

In this article, the author proposes a new solution that aims both to save disk space and reduce update time significantly, bringing Gentoo closer to the features of binary distributions. The ultimate goal of this project would be to make it possible to use the package manager efficiently without having to perform additional administrative tasks such as designating an extra partition.

Read on… [PDF]

8 thoughts on “Using deltas to speed up SquashFS updates”

  1. I’ve recently briefly used squashfs snapshots + zsync from https://binhost.ossdl.de/ and i find that way superior to recent rsync performance, even though they are only made daily.

    Squashfs is definitely a good idea (esp with recent improvements), and the aforementioned delta problem is pretty much only hurdle left.

  2. Hi – appreciate your approach into this field.

    One question:
    Is an approach to loop mounted a BTRFS file system located in a 600 MB file onto /usr/portage unde rthe current cirsumstances now the best choice ?
    Well, /usr/portage/distfiles has to be handled (and probably /usr was not the best idea at all).

    Small nagging comment : figure 1 misses IMO the ext4 file system, furthermore the annotation and the mix of files and file systems irritated me

    1. Well, I think the loop-mounted btrfs approach would work for some time. However, I’d go for a larger filesystem since we can expect the tree to grow and the extra space may be helpful during rsync.

      For the repository location, it has been discussed to death already. We all agree that /usr/portage is a bad choice, it’s just a matter on finally agreeing what to use instead. In fact, I think that almost every power user has already moved the directories somewhere else.

      ext4 was not placed in the figure because it is simply unfit for this use. With the static inode tables, you at least need to extend the number of inodes beyond default — and this makes it hard to get an accurate measure (since the net result depends on no of inodes). Even with a quite optimal choice, the fs use was over 1G. Placing it in the figure would shrink other bars too much.

      1. I created a 1.5 GB file /var/lib/portage.fs with btrfs and moved /usr/portage into it. Before I moved and symlinked the distfiles :
        /usr/portage/distfiles -> /var/lib/distfiles

        With the following entry in /etc/fstab :
        /var/lib/portage.fs /usr/portage btrfs auto,noatime,compress=lzo
        it is now mounted automatically.

        What I forgot in the first step was to mount the btrffs file with the compress options – resulting now in 718MB of 1500 MB are filled. But from now all subsequent rsyncs will compress the delta of the tree.

        1. The results are better than expected – expecially due to the use of an external USB HD – the rsync is now often made within 1 minute or so – before it took much longer.

      2. Can you please provide a pointer to that discussion? I am not a dev, so I have never heard anything about it. I put portage in /portage directly, since it’s separate from /usr, and it’s easy to look for packages by just using ls /portage/category. I think that /portage also makes sense if you are mounting it as a separate filesystem, which I do. I’ve kept distfiles on /home for a long time, but I now just keep them with portage itself in /portage/distfiles. I think that we should have a ”Portage Best Practices” article on the wiki to let people know of these tricks, as I think many users are unaware of them.

        Best,

          1. Cool. Thanks for the pointers! After reading all that, I think I’m going to use the following setup from now on:

            PORTDIR=/var/cache/portage/tree
            DISTDIR=/var/cache/portage/distfiles
            PKGDIR=/var/cache/portage/packages

            and /var/cache/portage/overlays for overlays (even though I don’t use any at the moment). Maybe you could suggest the setup above to the other devs on the list? Also, what is the current status on this? I noticed that recently portage started to complain if PORTDIR wasn’t set in make.conf, so I assume this is going somewhere…

            Best,

Leave a Reply

Your email address will not be published.