portage API now provides an asyncio event loop policy

In portage-2.3.30, portage’s python API provides an asyncio event loop policy via a DefaultEventLoopPolicy class. For example, here’s a little program that uses portage’s DefaultEventLoopPolicy to do the same thing as emerge --regen, using an async_iter_completed function to implement the --jobs and --load-average options:

#!/usr/bin/env python

from __future__ import print_function

import argparse
import functools
import multiprocessing
import operator

import portage
from portage.util.futures.iter_completed import (
    async_iter_completed,
)
from portage.util.futures.unix_events import (
    DefaultEventLoopPolicy,
)


def handle_result(cpv, future):
    metadata = dict(zip(portage.auxdbkeys, future.result()))
    print(cpv)
    for k, v in sorted(metadata.items(),
        key=operator.itemgetter(0)):
        if v:
            print('\t{}: {}'.format(k, v))
    print()


def future_generator(repo_location, loop=None):

    portdb = portage.portdb

    for cp in portdb.cp_all(trees=[repo_location]):
        for cpv in portdb.cp_list(cp, mytree=repo_location):
            future = portdb.async_aux_get(
                cpv,
                portage.auxdbkeys,
                mytree=repo_location,
                loop=loop,
            )

            future.add_done_callback(
                functools.partial(handle_result, cpv))

            yield future


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--repo',
        action='store',
        default='gentoo',
    )
    parser.add_argument(
        '--jobs',
        action='store',
        type=int,
        default=multiprocessing.cpu_count(),
    )
    parser.add_argument(
        '--load-average',
        action='store',
        type=float,
        default=multiprocessing.cpu_count(),
    )
    args = parser.parse_args()

    try:
        repo_location = portage.settings.repositories.\
            get_location_for_name(args.repo)
    except KeyError:
        parser.error('unknown repo: {}\navailable repos: {}'.\
            format(args.repo, ' '.join(sorted(
            repo.name for repo in
            portage.settings.repositories))))

    policy = DefaultEventLoopPolicy()
    loop = policy.get_event_loop()

    try:
        for future_done_set in async_iter_completed(
            future_generator(repo_location, loop=loop),
            max_jobs=args.jobs,
            max_load=args.load_average,
            loop=loop):
            loop.run_until_complete(future_done_set)
    finally:
        loop.close()



if __name__ == '__main__':
    main()

Adapting regular iterators to asynchronous iterators in python

For I/O bound tasks, python coroutines make a nice replacement for threads. Unfortunately, there’s no asynchronous API for reading files, as discussed in the Best way to read/write files with AsyncIO thread of the python-tulip mailing list.

Meanwhile, it is essential that a long-running coroutine contain some asynchronous calls, since otherwise it will run all the way to completion before any other event loop tasks are allowed to run. For a long-running coroutine that needs to call a conventional iterator (rather than an asynchronous iterator), I’ve found this converter class to be useful:

class AsyncIteratorExecutor:
    """
    Converts a regular iterator into an asynchronous
    iterator, by executing the iterator in a thread.
    """
    def __init__(self, iterator, loop=None, executor=None):
        self.__iterator = iterator
        self.__loop = loop or asyncio.get_event_loop()
        self.__executor = executor

    def __aiter__(self):
        return self

    async def __anext__(self):
        value = await self.__loop.run_in_executor(
            self.__executor, next, self.__iterator, self)
        if value is self:
            raise StopAsyncIteration
        return value

For example, it can be used to asynchronously read lines of a text file as follows:

async def cat_file_async(filename):
    with open(filename, 'rt') as f:
        async for line in AsyncIteratorExecutor(f):
            print(line.rstrip())

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(
            cat_file_async('/path/of/file.txt'))
    finally:
        loop.close()

socket-burst-dampener – An inetd-like daemon for handling bursts of connections

Suppose that you host a gentoo rsync mirror on your company intranet, and you want it to gracefully handle bursts of many connections from clients, queuing connections as long as necessary for all of the clients to be served (if they don’t time out first). However, you don’t want to allow unlimited rsync processes, since that would risk overloading your server. In order to solve this problem, I’ve created socket-burst-dampener, an inetd-like daemon for handling bursts of connections.

It’s a very simple program, which only takes command-line arguments (no configuration file). For example:

socket-burst-dampener 873 \
--backlog 8192 --processes 128 --load-average 8 \
-- rsync --daemon

This will allow up to 128 concurrent rsync processes, while automatically backing off on processes if the load average exceeds 8. Meanwhile, the --backlog 8192 setting means that the kernel will queue up to 8192 connections (until they are served or they time out). You need to adjust the net.core.somaxconn sysctl in order for the kernel to queue that many connections, since net.core.somaxconn defaults to 128 connections (cat /proc/sys/net/core/somaxconn).

tardelta – Generate a tarball of differences between two tarballs

I’ve created a utility called tardelta (ebuild available) that people using containers may be interested in. Here’s the README:

It is possible to optimize docker containers such that multiple containers are based off of a single copy of a common base image. If containers are constructed from tarballs, then it can be useful to create a delta tarball which contains the differences between a base image and a derived image. The delta tarball can then be layered on top of the base image using a Dockerfile like the following:

FROM base
ADD delta.tar.xz /

Many different types of containers can thus be derived from a common base image, while sharing a single copy of the base image. This saves disk space, and can also reduce memory consumption since it avoids having duplicate copies of base image data in the kernel’s buffer cache.

Experimental EAPI 5-hdepend

In portage-2.1.11.22 and 2.2.0_alpha133 there’s support for expermental EAPI 5-hdepend which adds the HDEPEND variable which is used to represent build-time host dependencies. For build-time target dependencies, use DEPEND (if the host is the target then both HDEPEND and DEPEND will be installed on it). There’s a special “targetroot” USE flag that will be automatically enabled for packages that are built for installation into a target ROOT, and will otherwise be automatically disabled. This flag may be used to control conditional dependencies, and ebuilds that use this flag need to add it to IUSE unless it happens to be included in the profile’s IUSE_IMPLICIT variable.

For those who may not be familiar with the history of HDEPEND, it was originally suggested in bug #317337. That was in 2010, and later that year there was some discussion about it on the chromium-os-dev mailing list. Recently, I suggested on the gentoo-dev mail list that it be included in EAPI 5, but it didn’t make it in. Since then, there’s been some renewed effort , and now the patch is included in mainline Portage.

preserve-libs now available in Portage 2.1 branch

EAPI 5 includes support for automatic rebuilds via the slot-operator and sub-slots, which has potential to make @preserved-rebuild unnecessary (see Diego’s blog post regarding symbol collisions and bug #364425 for some examples of @preserved-rebuild shortcomings). Since this support for automatic rebuilds has potential to greatly improve the user-friendliness of preserve-libs, I have decided to make preserve-libs available in the 2.1 branch of portage (beginning with portage-2.1.11.20). It’s not enabled by default, so you’ll have to set FEATURES=”preserve-libs” in make.conf if you want to enable it. After EAPI 5 and automatic rebuilds have gained widespread adoption, I might consider enabling preserve-libs by default.

Experimental EAPI 5_pre1

In portage-2.1.11.13 and 2.2.0_alpha124 there’s support for EAPI 5_pre1, which implements all of the features that are currently in the eapi-5 branch of PMS (including the features from EAPI 4-slot-abi, which I’ve blogged about before). For additional references about the upcoming EAPI 5, see the “EAPI 5 tentative features” wiki page.

If you’d like to experiment with EAPI 5_pre1, then you can refer to the corresponding portage documentation, and you may need to pay special attention to the new “Profile IUSE Injection” feature. Since the profiles currently aren’t configured for this feature yet, you’ll have to configure these variables yourself if your experimental ebuilds reference special flags (like x86, kernel_linux, elibc_glibc, and userland_GNU) without listing them explicitly in IUSE. Here’s an abbreviated example of what the variables should look like, which you can put in make.conf:

IUSE_IMPLICIT="prefix selinux"
USE_EXPAND="ELIBC KERNEL USERLAND"
USE_EXPAND_UNPREFIXED="ARCH"
USE_EXPAND_IMPLICIT="ARCH ELIBC KERNEL USERLAND"
USE_EXPAND_VALUES_ARCH="amd64 ppc ppc64 x86 x86-fbsd x86-solaris"
USE_EXPAND_VALUES_ELIBC="FreeBSD glibc"
USE_EXPAND_VALUES_KERNEL="FreeBSD linux SunOS"
USE_EXPAND_VALUES_USERLAND="BSD GNU"

I have not populated all of the above variables exhaustively, but these values should be enough to get you started. If you need a more complete set of ARCH values to list in USE_EXPAND_VALUES_ARCH, then you can grab the exhaustive set of values from arch.list.

Automatic rebuilds with experimental EAPI 4-slot-abi

In response to recent discussion on the gentoo-dev mailing list, in portage-2.1.11.1 and 2.2.0_alpha112 I’ve added experimental support for EAPI “4-slot-abi”. This EAPI makes it possible for packages to be rebuilt automatically when necessary, so that you don’t have to do it manually (nor using a helper such as revdep-rebuild or perl-cleaner). Hopefully this feature will soon be coming to an official future EAPI. I’ll post some example usage scenarios from portage’s unit tests for EAPI “4-slot-abi”.

Here’s an example dev-libs/icu upgrade:

$ emerge -pu dev-libs/icu

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild U ] dev-libs/icu-49 [4.8]
[ebuild R ] dev-libs/libxml2-2.7.8

As you can see, emerge detects that libxml has been built against an older version of dev-libs/icu, so it automatically rebuilds it against the newer version. Without EAPI support, typically a user handles this kind of rebuild manually, by running the revdep-rebuild helper to detect the breakage (or similarly running @preserved-rebuild with portage-2.2_alpha).

Here’s an example sys-libs/db upgrade:

$ emerge -pu sys-libs/db

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild NS ] sys-libs/db-4.8 [4.7]
[ebuild R ] app-office/libreoffice-3.5.4.2

In this case, emerge detects that libreoffice has been built against an older slot of sys-libs/db, so it automatically rebuilds it to link against the newer slot. This type of rebuild is not strictly necessary, unless you’d like emerge ‐‐depclean to be able to remove the older slot. If you want to avoid this kind of optional rebuild, you can use the emerge ‐‐rebuild-if-new-slot-abi=n option, or use ‐‐exclude=app-office/libreoffice if you want to be more specific (these options are documented in the emerge man page).

Here’s an example dev-libs/glib upgrade:

$ emerge -pu dev-libs/glib

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild U ] dev-libs/glib-2.32.3 [2.30.2]
[ebuild R ] dev-libs/dbus-glib-0.98

In this case, emerge detects that dbus-glib has been built against an older version of dev-libs/glib, so it automatically rebuilds it against the newer version. Without EAPI support, typically a user handles this kind of rebuild manually, after being prompted by an ewarn message (revdep-rebuild does not handle this case, as noted in bug #297483).

If you are interested in experimenting with EAPI “4-slot-abi”, then please refer to the corresponding html documentation that is installed by >=sys-apps/portage-2.1.11 with USE=doc, and also to the emerge(1) man page for information about the related ‐‐ignore-built-slot-abi-deps and ‐‐rebuild-if-new-slot-abi options.

Update 2012-07-01
There’s now an overlay available for testing EAPI 4-slot-abi. Please refer to bug 424429 for details.