Portability of tar features

The tar format is one of the oldest archive formats in use. It comes as no surprise that it is ugly — built as layers of hacks on the older format versions to overcome their limitations. However, given the POSIX standarization in late 80s and the popularity of GNU tar, you would expect the interoperability problems to be mostly resolved nowadays.

This article is directly inspired by my proof-of-concept work on new binary package format for Gentoo. My original proposal used volume label to provide user- and file(1)-friendly way of distinguish our binary packages. While it is a GNU tar extension, it falls within POSIX ustar implementation-defined file format and you would expect that non-compliant implementations would extract it as regular files. What I did not anticipate is that some implementation reject the whole archive instead.

This naturally raised more questions on how portable various tar formats actually are. To verify that, I have decided to analyze the standards for possible incompatibility dangers and build a suite of test inputs that could be used to check how various implementations cope with that. This article describes those points and provides test results for a number of implementations.

Please note that this article is focused merely on read-wise format compatibility. In other words, it establishes how tar files should be written in order to achieve best probability that it will be read correctly afterwards. It does not investigate what formats the listed tools can write and whether they can correctly create archives using specific features.

Continue reading