Problems faced when downstream testing Python packages

Downstream testing refers to the testing of software package done by their redistributors, such as Linux distributions. It could be done by distro-specific CI systems, package maintainers or — as it frequently is the case with Gentoo — even distribution users.

What makes downstream testing really useful is that it serves a different purpose than upstream testing does. To put it shortly, upstream testing aims to ensure that the current code of the package works in one or more reference environments, and meets quality standards set by the package authors. On the other hand, downstream testing aims to ensure that a particular version of the package (possibly an old one) works in the environment that it will be used on, or one that closely resembles it.

To put it another way, downstream testing may differ from upstreaming testing by:

  • testing a different (possibly old) package version
  • using a diferent (possibly newer) Python version
  • testing against different dependency versions
  • testing against an environment with additional packages installed (that may interfere unexpectedly)
  • testing on a different operating system, architecture, hardware, setup

While these may sound inconvenient and sometimes cause false positives, they have proven in the past to detect issues that went unnoticed by upstream and that could have broken production setups. Downstream testing is important.

Unfortunately, many test suites make assumptions that cause problems for downstream testers. Some of them can be worked around easily, others can not. In this article I’d like to discuss a number of these issues.

Continue reading

The inconsistencies around Python package naming and the new policy

For a long time, the dev-python category in Gentoo did not follow any specific naming policy. Usually we went for what made the ebuild easier — the GitHub project name, if we happened to be using GitHub archives as distfiles, or PyPI project name when using source distributions from PyPI. However, this was inconvenient for users who had a hard time finding specific packages. Historically, we even had cases of developers independently adding a second copy of the same package with different name.

This is why I eventually started researching the standards for Python package naming, and drafting a new policy. The package name policy can now be found in the Gentoo Python Guide. In this post, I’d like to summarize the research that led to forming it, and the problems that we are to face yet.

Continue reading “The inconsistencies around Python package naming and the new policy”

.tar sorting vs .xz compression ratio

It is a pretty common knowledge that ordering of members within archive can affect the compression ratio. I’ve done some quick testing and the results somewhat surprised me. Firstly, it turned out that the simplest lexical sorting by name (path) gave the best result. Secondly, because it turned out that the difference between that and sorting by size was as large as 8%.

Note that this is a pretty specific source archive, so results may vary. Test details and commands in the remainder of the post.

Compression results per sort order
Sort order Size in bytes Compared to best
name 108 011 756 100.00%
suffix 108 573 612 100.52%
size (smallest first) 116 797 440 108.13%
size (largest first) 116 645 940 108.00%
suffix + size 111 709 128 103.42%

The conclusion? Sorting can affect compression ratio more than I have anticipated. However, all the “obvious” optimizations have made the result worse than plain lexical sorting. Perhaps it’s just the matter of well-organized source code keeping similar files in the same directories. Perhaps there is a way to optimize it even more (and beat sorting by name). One interesting option would be to group files by bucket sizes, and then sort by name.

Special thanks to Adrien Nader and Lasse Collin from #tukaani for inspiring me to do this.

Continue reading “.tar sorting vs .xz compression ratio”

Clang in Gentoo now sets default runtimes via config file

The upcoming clang 16 release features substantial improvements to configuration file support. Notably, it adds support for specifying multiple files and better default locations. This enabled Gentoo to finally replace the default-* flags used on sys-devel/clang, effectively empowering our users with the ability to change defaults without rebuilding whole clang.

This change has also been partially backported to clang 15.0.2 in Gentoo, and (unless major problems are reported) will be part of the stable clang 15.x release (currently planned for upcoming 15.0.3).

In this post, I’d like to shortly describe the new configuration file features, how much of them have been backported to 15.x in Gentoo and how defaults are going to be selected from now on.
Continue reading “Clang in Gentoo now sets default runtimes via config file”