The Council and the Community

A new Council election is in progress and we have a few candidates. Most of them have written a manifesto. For some of them this is one of the few mails they sent to the public mailing lists recently. For one of them this is the only one. Do we want to elect people who do not participate actively in the Community? Does such election even make sense?

Continue reading

Inlining -march=native for distcc

-march=native is a gcc flag that enables auto-detection of CPU architecture and properties. Not only it allows you to avoid finding the correct value of -march= but also enables instruction sets that do not fit any standard CPU profile and detects the cache sizes.

Sadly, -march=native itself can’t really work well with distcc. Since the detection is performed when compiling, remote gcc invocations would use the architecture of the distcc host rather than the client. Therefore, the resulting executables would be a mix of different architectures used by distcc.

You may also find -march=native a bit opaque. For example, we had multiple bug reports about LLVM failing to build with -march=atom. However, some of the reporters were using -march=native, so we wasn’t able to immediately identify the duplicates.

In this article, I will guide you shortly on replacing -march=native with expanded compiler flags, for the benefit of distcc compatibility and more explicit build logs.

Continue reading

Reducing SquashFS delta size through partial decompression

In a previous article titled ‘using deltas to speed up SquashFS ebuild repository updates’, the author has considered benefits of using binary deltas to update SquashFS images. The proposed method has proven very efficient in terms of disk I/O, memory and CPU time use. However, the relatively large size of deltas made network bandwidth a bottleneck.

The rough estimations done at the time proved that this is not a major issue for a common client with a moderate-bandwidth link such as ADSL. Nevertheless, the size is an inconvenience both to clients and to mirror providers. Assuming that there is an upper bound on disk space consumed by snapshots, the extra size reduces the number of snapshots stored on mirrors, and therefore shortens the supported update period.

The most likely cause for the excessive delta size is the complexity of correlation between input and compressed output. Changes in input files are likely to cause much larger changes in the SquashFS output that the tested delta algorithms fail to express efficiently.

For example, in the LZ family of compression algorithms, a change in input stream may affect the contents of the dictionary and therefore the output stream following it. In block-based compressors such as bzip2, a change in input may shift all the following data moving it across block boundaries. As a result, the contents of all the blocks following it change, and therefore the compressed output for each of them.

Since SquashFS splits the input into multiple blocks that are compressed separately, the scope of this issue is much smaller than in plain tarballs. Nevertheless, small changes occurring in multiple blocks are able to grow delta two to four times as large as it would be if the data was not compressed. In this paper, the author explores the possibility of introducing a transparent decompression in the delta generation process to reduce the delta size.

Read on… [PDF]

A few words on lzip compressor

Some of you may already have noticed that sys-apps/ed and sys-fs/ddrescue packages started pulling in lzip archiver. «Is this some new fancy archiver?» you may ask. The answer is «no. It’s been around for a very long time, and it never got any real interest.»

You can read some of the background story in New Options in the World of File Compression Linux Gazette article. Long story short, lzip was created before xz as a response to the limitations of .lzma format used by lzma-utils. However, it never got any real attention and when xz-utils was released as a direct successor to lzma-utils it became practically redundant. And the two projects co-existed silently until lately…

Over the past five years, Antonio Diaz Diaz, lzip’s author, and a few project supporters were trying to convince the community that the lzip format is superior to xz. However, they were never able to provide any convincing arguments to the community, and while xz gained popularity lzip stayed in the shadow. And it was used mostly by the projects Diaz was member of.

It seems that he has finally decided that advocacy will not help his pet project in gaining popularity. Instead, he decided to take advantage of his administrator position in the mentioned GNU projects and discontinue providing non-.lz tarballs. As he says, «surely every user of ddrescue would like to know about lzip […]».

So, Gentoo user, would you like to know about lzip? Let’s try to get a few fair points here.

Continue reading

Using deltas to speed up SquashFS updates

The ebuild repository format that is used by Gentoo generally fits well in the developer and power user work flow. It has a simple design that makes reading, modifying and adding ebuilds easy. However, the large number of separate small files with many similarities do not make it very space efficient and often impacts performance. The update (rsync) mechanism is relatively slow compared to distributions like Arch Linux, and is only moderately bandwidth efficient.

There were various attempts at solving at least some of those issues. Various filesystems were used in order to reduce the space consumption and improve performance. Delta updates were introduced through the emerge-delta-webrsync tool to save bandwidth. Sadly, those solutions usually introduce other inconveniences.

Using a separate filesystem for the repositories involves additional maintenance. Using a read-only filesystem makes updates time-consuming. Similarly, the delta update mechanism — while saving bandwidth — usually takes more time than plain rsync update.

In this article, the author proposes a new solution that aims both to save disk space and reduce update time significantly, bringing Gentoo closer to the features of binary distributions. The ultimate goal of this project would be to make it possible to use the package manager efficiently without having to perform additional administrative tasks such as designating an extra partition.

Read on… [PDF]