Gentoo – Page 4 – Michał Górny

My thin wrapper for emerge(1)

I’ve recently written a thin wrapper over emerge that I use in my development environment. It does the following:

set tmux pane title to the first package argument (so you can roughly see what’s emerging on every pane)
beep meaningfully when emerge finishes (two beeps for success, three for failure),
run pip check after successful run to check for mismatched Python dependencies.

Continue reading “My thin wrapper for emerge(1)”

Problems faced when downstream testing Python packages

Downstream testing refers to the testing of software package done by their redistributors, such as Linux distributions. It could be done by distro-specific CI systems, package maintainers or — as it frequently is the case with Gentoo — even distribution users.

What makes downstream testing really useful is that it serves a different purpose than upstream testing does. To put it shortly, upstream testing aims to ensure that the current code of the package works in one or more reference environments, and meets quality standards set by the package authors. On the other hand, downstream testing aims to ensure that a particular version of the package (possibly an old one) works in the environment that it will be used on, or one that closely resembles it.

To put it another way, downstream testing may differ from upstreaming testing by:

testing a different (possibly old) package version
using a diferent (possibly newer) Python version
testing against different dependency versions
testing against an environment with additional packages installed (that may interfere unexpectedly)
testing on a different operating system, architecture, hardware, setup

While these may sound inconvenient and sometimes cause false positives, they have proven in the past to detect issues that went unnoticed by upstream and that could have broken production setups. Downstream testing is important.

Unfortunately, many test suites make assumptions that cause problems for downstream testers. Some of them can be worked around easily, others can not. In this article I’d like to discuss a number of these issues.

The inconsistencies around Python package naming and the new policy

For a long time, the dev-python category in Gentoo did not follow any specific naming policy. Usually we went for what made the ebuild easier — the GitHub project name, if we happened to be using GitHub archives as distfiles, or PyPI project name when using source distributions from PyPI. However, this was inconvenient for users who had a hard time finding specific packages. Historically, we even had cases of developers independently adding a second copy of the same package with different name.

This is why I eventually started researching the standards for Python package naming, and drafting a new policy. The package name policy can now be found in the Gentoo Python Guide. In this post, I’d like to summarize the research that led to forming it, and the problems that we are to face yet.

Continue reading “The inconsistencies around Python package naming and the new policy”

Handy commands to clean up old ~arch-only packages

Here’s a bunch of handy commands that I’ve conceived to semi-automatically remove old versions of packages that do not have stable keywords (and therefore are not subject to post-stabilization cleanups that I do normally).

Continue reading “Handy commands to clean up old ~arch-only packages”

.tar sorting vs .xz compression ratio

It is a pretty common knowledge that ordering of members within archive can affect the compression ratio. I’ve done some quick testing and the results somewhat surprised me. Firstly, it turned out that the simplest lexical sorting by name (path) gave the best result. Secondly, because it turned out that the difference between that and sorting by size was as large as 8%.

Note that this is a pretty specific source archive, so results may vary. Test details and commands in the remainder of the post.

Compression results per sort order
Sort order	Size in bytes	Compared to best
name	108 011 756	100.00%
suffix	108 573 612	100.52%
size (smallest first)	116 797 440	108.13%
size (largest first)	116 645 940	108.00%
suffix + size	111 709 128	103.42%

The conclusion? Sorting can affect compression ratio more than I have anticipated. However, all the “obvious” optimizations have made the result worse than plain lexical sorting. Perhaps it’s just the matter of well-organized source code keeping similar files in the same directories. Perhaps there is a way to optimize it even more (and beat sorting by name). One interesting option would be to group files by bucket sizes, and then sort by name.

Special thanks to Adrien Nader and Lasse Collin from #tukaani for inspiring me to do this.

Continue reading “.tar sorting vs .xz compression ratio”