Reducing SquashFS delta size through partial decompression

In a previous article titled ‘using deltas to speed up SquashFS ebuild repository updates’, the author has considered benefits of using binary deltas to update SquashFS images. The proposed method has proven very efficient in terms of disk I/O, memory and CPU time use. However, the relatively large size of deltas made network bandwidth a bottleneck.

The rough estimations done at the time proved that this is not a major issue for a common client with a moderate-bandwidth link such as ADSL. Nevertheless, the size is an inconvenience both to clients and to mirror providers. Assuming that there is an upper bound on disk space consumed by snapshots, the extra size reduces the number of snapshots stored on mirrors, and therefore shortens the supported update period.

The most likely cause for the excessive delta size is the complexity of correlation between input and compressed output. Changes in input files are likely to cause much larger changes in the SquashFS output that the tested delta algorithms fail to express efficiently.

For example, in the LZ family of compression algorithms, a change in input stream may affect the contents of the dictionary and therefore the output stream following it. In block-based compressors such as bzip2, a change in input may shift all the following data moving it across block boundaries. As a result, the contents of all the blocks following it change, and therefore the compressed output for each of them.

Since SquashFS splits the input into multiple blocks that are compressed separately, the scope of this issue is much smaller than in plain tarballs. Nevertheless, small changes occurring in multiple blocks are able to grow delta two to four times as large as it would be if the data was not compressed. In this paper, the author explores the possibility of introducing a transparent decompression in the delta generation process to reduce the delta size.

Read on… [PDF]

A few words on lzip compressor

(update on 2016-01-28: please note that this post is outdated and no longer adequately expresses my attitude towards the two competing formats)

Some of you may already have noticed that sys-apps/ed and sys-fs/ddrescue packages started pulling in lzip archiver. «Is this some new fancy archiver?» you may ask. The answer is «no. It’s been around for a very long time, and it never got any real interest.»

You can read some of the background story in New Options in the World of File Compression Linux Gazette article. Long story short, lzip was created before xz as a response to the limitations of .lzma format used by lzma-utils. However, it never got any real attention and when xz-utils was released as a direct successor to lzma-utils it became practically redundant. And the two projects co-existed silently until lately…

Over the past five years, Antonio Diaz Diaz, lzip’s author, and a few project supporters were trying to convince the community that the lzip format is superior to xz. However, they were never able to provide any convincing arguments to the community, and while xz gained popularity lzip stayed in the shadow. And it was used mostly by the projects Diaz was member of.

It seems that he has finally decided that advocacy will not help his pet project in gaining popularity. Instead, he decided to take advantage of his administrator position in the mentioned GNU projects and discontinue providing non-.lz tarballs. As he says, «surely every user of ddrescue would like to know about lzip […]».

So, Gentoo user, would you like to know about lzip? Let’s try to get a few fair points here.

Continue reading “A few words on lzip compressor”

Using deltas to speed up SquashFS updates

The ebuild repository format that is used by Gentoo generally fits well in the developer and power user work flow. It has a simple design that makes reading, modifying and adding ebuilds easy. However, the large number of separate small files with many similarities do not make it very space efficient and often impacts performance. The update (rsync) mechanism is relatively slow compared to distributions like Arch Linux, and is only moderately bandwidth efficient.

There were various attempts at solving at least some of those issues. Various filesystems were used in order to reduce the space consumption and improve performance. Delta updates were introduced through the emerge-delta-webrsync tool to save bandwidth. Sadly, those solutions usually introduce other inconveniences.

Using a separate filesystem for the repositories involves additional maintenance. Using a read-only filesystem makes updates time-consuming. Similarly, the delta update mechanism — while saving bandwidth — usually takes more time than plain rsync update.

In this article, the author proposes a new solution that aims both to save disk space and reduce update time significantly, bringing Gentoo closer to the features of binary distributions. The ultimate goal of this project would be to make it possible to use the package manager efficiently without having to perform additional administrative tasks such as designating an extra partition.

Read on… [PDF]

Keeping the majority happy

One of the major problems that I have faced in Gentoo is that whatever I was doing on a larger scale made some people really unhappy. I would even say that the specifics of Gentoo make it even possible for users to get outraged (and obviously, let all the world know how outraged they are) by a few files they don’t like having installed.

This brings the question whether we should struggle to keep all of our users happy, or whether keeping majority of our users satisfied is sufficient.

I don’t believe that it is actually possible to keep everyone happy. It’s a kind of never-ending struggle that consumes valuable time and demands sacrifices. You work on making one user happy, the other one doesn’t like the result. You work towards the second one, third one is unhappy. You satisfy all three of them, now fourth comes outraged with the net result.

Continue reading “Keeping the majority happy”

Local device access — from plugdev to logind

One of the more curious problems of running Linux on desktops is handling the local device access. The idea is usually quite simple: local users should have access to devices such as removable media (floppies, pendrives), scanners, speakers, webcams and ability to power off or reboot the computer. At the same time, remote users should have that access restricted.

Why? I think the main rationale behind this is that those users have physical access to those functions (yes, you could say that they have physical hard drive access too). They can insert the floppies, plug the pendrive, press the power button or just pull out the plug. They usually suffer the speaker noises and scare in front of the webcam.

At the same time, remote (or inactive users) shouldn’t be given the right to shut down the system unexpectedly, shout into the speakers, stream the user’s webcam or install Windows to his pendrive. I think that doesn’t need explaining.

I would like to shortly describe a few attempts to solve the problem and the issues with them.

Continue reading “Local device access — from plugdev to logind”