marecki – Alice in Penguinland

Console-bound systemd services, the right way

Let’s say that you need to run on your system some sort server software which instead of daemonising, has a command console permanently attached to standard input. Let us also say that said console is the only way for the administrator to interact with the service, including requesting its orderly shutdown – whoever has written it has not implemented any sort of signal handling so sending SIGTERM to the service process causes it to simply drop dead, potentially losing data in the process. And finally, let us say that the server in question is proprietary software so it isn’t really possible for you to fix any of the above in the source code (yes, I am talking about a specific piece of software – which by the way is very much alive and kicking as of late 2020). What do you do?

According to the collective wisdom of World Wide Web, the answer to this question is “use a terminal multiplexer like tmux or screen“, or at the very least a stripped-down variant of same such as dtach. OK, that sort of works – what if you want to run it as a proper system-managed service under e.g. OpenRC? The answer of the Stack Exchange crowd: have your init script invoke the terminal multiplexer. Oooooookay, how about under systemd, which actually prefers services it manages not to daemonise by itself? Nope, still “use a terminal multiplexer”.

What follows is my attempt to run a service like this under systemd more efficiently and elegantly, or at least with no extra dependencies beyond basic Unix shell commands.

Let us have a closer look at what systemd does with standard I/O of processes it spawns. The man page systemd.exec(5) tells us that what happens here is controlled by the directives StandardInput, StandardOutput and StandardError. By default the former is assigned to null while the latter two get piped to the journal, there are however quite a few other options here. According to the documentation, here is what systemd allows us to connect to standard input:

- we are not interested in null (for obvious reasons) or any of the tty options (the whole point of this exercise is to run fully detached from any terminals);
- data would work if we needed to feed some commands to the service when it starts but is useless for triggering a shutdown;
- file looks promising – just point it to a FIFO on the file system and we’re all set – but it doesn’t actually take care of creating the FIFO for us. While we could in theory work around that by invoking mkfifo (and possibly chown if the service is to run as a specific user) in ExecStartPre, let’s see if we can find a better option
- socket “is valid in socket-activated services only” and the corresponding socket unit must “have Accept=yes set”. What we want is the opposite, i.e. for the service to create its socket
- finally, there is fd – which seems to be exactly what we need. According to the documentation all we have to do is write a socket unit creating a FIFO with appropriate ownership and permissions, make it a dependency of our service using the Sockets directive, and assign the corresponding named file descriptor to standard input.

Let’s try it out. To begin with, our socket unit “proprietarycrapd.socket”. Note that I have successfully managed to get this to work using unit templates as well, %i expansion works fine both here and while specifying unit or file-descriptor names in the service unit – but in order to avoid any possible confusion caused by the fact socket-activated services explicitly require being defined with templates, I have based my example on static units:

[Unit]
Description=Command FIFO for proprietarycrapd

[Socket]
ListenFIFO=/run/proprietarycrapd/pcd.control
DirectoryMode=0700
SocketMode=0600
SocketUser=pcd
SocketGroup=pcd
RemoveOnStop=true

Apart from the fact the unit in question has got no [Install] section (which makes sense given we want this socket to only be activated by the corresponding service, not by systemd itself), nothing out of the ordinary here. Note that since we haven’t used the directive FileDescriptorName, systemd will apply default behaviour and give the file descriptor associated with the FIFO the name of the socket unit itself.

And now, our service unit “proprietarycrapd.service”:

[Unit]
Description=proprietarycrap daemon
After=network.target

[Service]
User=pcd
Group=pcd
Sockets=proprietarycrapd.socket
StandardInput=socket
StandardOutput=journal
StandardError=journal
ExecStart=/opt/proprietarycrap/bin/proprietarycrapd
ExecStop=/usr/local/sbin/proprietarycrapd-stop

[Install]
WantedBy=multi-user.target

StandardInput=socket??? Whatever’s happened to StandardInput=fd:proprietarycrapd.socket??? Here is an odd thing. If I use the latter on my system, the service starts fine and gets the FIFO attached to its standard input – but when I try to stop the service the journal shows “Failed to load a named file descriptor: No such file or directory”, the ExecStop command is not run and systemd immediately fires a SIGTERM at the process. No idea why. Anyway, through trial and error I have found out that StandardInput=socket not only works fine in spite of being used in a service that is not socket-activated but actually does exactly what I wanted to achieve – so that is what I have ended up using.

Which brings us to the final topic, the ExecStop command. There are three reasons why I have opted for putting all the commands required to shut the server down in a shell script:

- first and foremost, writing the shutdown command to the FIFO will return right away even if the service takes time to shut down. systemd sends SIGTERM to the unit process as soon as the last ExecStop command has exited so we have to follow the echo with something that waits for the server process to finish (see below)
- systemd does not execute Exec commands in a shell so simply running echo > /run/proprietarycrapd/pcd.control doesn’t work, we would have to wrap the echo call in an explicit invocation of a shell
- between the aforementioned two reasons and the fact the particular service for which I have created these units actually requires several commands in order to execute an orderly shutdown, I have decided that putting all those command in a script file instead of cramming them into the unit would be much cleaner.

The shutdown script itself is mostly unremarkable so I’ll only quote the bit responsible for waiting for the server to actually shut down. At present I am still looking for doing it in blocking fashion without adding more dependencies (wait only works on child processes of the current shell, the server in question does not create any lock files to which I could attach inotifywait, and attaching the latter to the relevant directory in /proc does not work) but in the meantime, the loop

while kill -0 “${MAINPID}” 2> /dev/null; do
sleep 1s
done

keeps the script ticking along until either the process has exited or the script has timed out (see the TimeoutStopSec directive in systemd.service(5)) and systemd has killed both it and the service itself.

Acknowledgements: with many thanks to steelman for having figured out the StandardInput=socket bit in particular and having let me bounce my ideas off him in general.

Case label for Pocket Science Lab V5

tl;dr: Here (PDF, 67 kB) is a case label for Pocket Science Lab version 5 that is compatible with the design for a laser-cut case published by FOSSAsia.

In case you haven’t heard about it, Pocket Science Lab [1] is a really nifty board developed by the FOSSAsia community which combines a multichannel, megahertz-range oscilloscope, a multimeter, a logic probe, several voltage sources and a current source, several wave generators, UART and I2C interfaces… and all of this in the form factor of an Arduino Mega, i.e. only somewhat larger than that of a credit card. Hook it up over USB to a PC or an Android device running the official (free and open source, of course) app and you are all set.

Well, not quite set yet. What you get for your 50-ish EUR is just the board itself. You will quite definitely need a set of probe cables (sadly, I have yet to find even an unofficial adaptor allowing one to equip PSLab with standard industry oscilloscope probes using BNC connectors) but if you expect to lug yours around anywhere you go, you will quite definitely want to invest in a case of some sort. While FOSSAsia does not to my knowledge sell PSLab cases, they provide a design for one [2]. It is meant to be laser-cut but I have successfully managed to 3D-print it as well, and for the more patient among us it shouldn’t be too difficult to hand-cut one with a jigsaw either.

Of course in addition to making sure your Pocket Science Lab is protected against accidental damage it would also be nice to have all the connectors clearly labelled. Documentation bundled with PSLab software does show not a few “how to connect instrument X” diagrams but unfortunately said diagrams picture a version 4 of the board and the current major version, V5, features radically different pinout (compare [3] with [4]/[5] and you will see immediately what I mean), not to mention that having to stare at a screen while wiring your circuit isn’t always optimal. Now, all versions of the board feature a complete set of header labels (along with LEDs showing the device is active) on the front side and at least the more recent ones additionally show more detailed descriptions on the back, clearly suggesting the optimal way to go is to make your case our of transparent material. But what if looking at the provided labels directly is not an option, for instance because you have gone eco-friendly and made your case out of wood? Probably stick a label to the front of the case… which brings us back to the problem of the case label from [5] not being compatible with recent versions of the board.

Which brings me to my take on adapting the design from [5] to match the header layout and labels of PSLab V5.1 as well as the laser-cut case design from [2]. It could probably be more accurate but having tried it out, it is close enough. Bluetooth and ICSP-programmer connectors near the centre of the board are not included because the current case design does not provide access to them and indeed, they haven’t even got headers soldered in. Licence and copyright: same as the original.

[1] https://pslab.io/

[2] https://github.com/fossasia/pslab-case

[3] https://github.com/fossasia/pslab-hardware/raw/master/docs/images/PSLab_v5_top.png

[4] https://github.com/fossasia/pslab-hardware/raw/master/docs/images/pslab_version_previews/PSLab_v4.png

[5] https://github.com/fossasia/pslab-hardware/raw/master/docs/images/pslabdesign.png

Randomness in virtual machines

I always felt that entropy available to the operating system must be affected by running said operating system in a virtual environment – after all, unpredictable phenomena used to feed the entropy pool are commonly based on hardware and in a VM most hardware either is simulated or has the hypervisor mediate access to it. While looking for something tangentially related to the subject, I have recently stumbled upon a paper commissioned by the German Federal Office for Information Security which covers this subject, with particular emphasis on entropy sources used by the standard Linux random-number generator (i.e. what feeds /dev/random and /dev/urandom), in extreme detail:

https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/ZufallinVMS/Randomness-in-VMs.pdf?__blob=publicationFile&v=3

Personally I have found this paper very interesting but since not everyone can stomach a 142-page technical document, here are some of the main points made by its authors regarding entropy.

As a reminder – the RNG in recent versions of Linux uses five sources of random noise, three of which do not require dedicated hardware (emulated or otherwise): Human-Interface Devices, rotational block devices, and interrupts.
Running in a virtual machine does not affect entropy sources implemented purely in hardware or purely in software. However, it does affect hybrid sources – which in case of Linux essentially means all of the default ones.
Surprisingly enough, virtualisation seems to have no negative effect on the quality of delivered entropy. This is at least in part due to additional CPU execution-time jitter introduced by the hypervisor compensating for increased predictability of certain emulated devices.
On the other hand, virtualisation can strongly affect the quantity of produced entropy. More on this below.
Low quantity of available entropy means among other things that it takes virtual machines visibly longer to bring /dev/urandom to usable state. This is a problem if /dev/urandom is used by services started at boot time because they can be initialised using low-quality random data.

Why exactly is the quantity of entropy low in virtual machines? The problem is that in a lot of configurations, only the last of the three standard noise sources will be active. On the one hand, even physical servers tend to be fairly inactive on the HID front. On the other, the block-device source does nothing unless directly backed by a rotational device – which has been becoming less and less likely, especially when we talk about large cloud providers who, chances are, hold your persistent storage on distributed networked file systems which are miles away from actual hard drives. This leaves interrupts as the only available noise source. Now take a machine configured this way and have it run a VPN endpoint, a HTTPS server, a DNSSEC-enabled DNS server… You get the idea.

But wait, it gets worse. Chances are many devices in your VM, especially ones like network cards which are expected to be particularly active and therefore the main source of interrupts, are in fact paravirtualised rather than fully emulated. Will such devices still generate enough interrupts? That depends on the underlying hypervisor, or to be precise on the paravirtualisation framework it uses. The BSI paper discusses this for KVM/QEMU, Oracle VirtualBox, Microsoft Hyper-V, and VMWare ESXi:

the former two use the the VirtIO framework, which is integrated with the Linux interrupt-handling code;
VMWare drivers trigger the interrupt handler similarly to physical devices;
Hyper-V drivers use a dedicated communication channel called VMBus which does not invoke the LRNG interrupt handler. This means it is entirely feasible for a Linux Hyper-V guest to have all noise sources disabled, and reenabling the interrupt one comes at a cost of performance loss caused by the use of emulated rather than paravirtualised devices.

All in all, the paper in question has surprised me with the news of unchanged quality of entropy in VMs and confirmed my suspicions regarding its quantity. It also briefly mentions (which is how I have ended up finding it) the way I typically work around this problem, i.e. by employing VirtIO-RNG – a paravirtualised hardware RNG in the KVM/VirtualBox guests which which interfaces with a source of entropy (typically /dev/random, although other options are possible too) on the host. Combine that with haveged on the host, sprinkle a bit of rate limiting on top (I typically set it to 1 kB/s per guest, even though in theory haveged is capable of acquiring entropy at a rate several orders of magnitude higher) and chances are you will never have to worry about lack of entropy in your virtual machines again.

Gentoo Linux in a Docker container

I have been using Docker for ebuild development for quite a while and absolutely love it, mostly because how easy it is to manipulate filesystem state with it. Work on several separate ebuilds in parallel? Just spin up several containers. Clean up once I’m done? Happens automatically when I close the container. Come back to something later? One docker commit invocation and I’m done. I could of course do something similar with virtual machines (and indeed I have to for cross-platform work) – but for native amd64 is is extremely convenient.

There is, however, one catch. By default processes running in a Docker container are fairly restricted privilege-wise and the Gentoo sandbox uses ptrace(). Result? By default, certain ebuilds (sys-libs/glibc and dev-libs/gobject-introspection , to name just two) will fail to emerge. One can of course set FEATURES=”-sandbox -usersandbox” for such ebuilds but it is an absolute no-no for both new ebuilds and any stabilisation work.

In the past working around this issue required messing with Docker security policies, which at least I found rather awkward. Fortunately since version 1.13.0 there has been a considerably easier way – simply pass

--cap-add=SYS_PTRACE

to docker-run. Done! Sandbox can now use ptrace() to its heart’s content.

Big Fat Warning: The reason why by default Docker restricts CAP_SYS_PTRACE is that a malicious program can use ptrace() to break out of the container it runs in. Do not grant this capability to containers unless you know what you are doing. Seriously.

Unfortunately the above is not the end of the story because at least as of version 1.13.0, Docker does not allow to enhance the capabilities of a docker-build job. Why is this a problem? For my own work I use a custom image which extends somewhat the official gentoo/stage3-amd64-hardened . One of the things my Dockerfile does is rsync the Portage tree and update @world so that my image contains a fully up-to-date stage3 even when the official base image does not. You can guess what happens when Docker tries to emerge an ebuild requiring the sandbox to use ptrace()… and remember, one of the packages containing such ebuilds is sys-libs/glibc . To my current knowledge the only way around this is to spin up a ptrace-enabled container using the latest good intermediate image left behind by docker-build and execute the remaining build steps manually. Not fun… Hope they will fix this some day.

Changing the passphrase for SSH keys in gpg-agent

Possibly the simplest way of changing the passhprase protecting a SSH key imported into gpg-agent is to use the Assuan passwd command:

echo passwd foo | gpg-connect-agent

where foo is the keygrip of your SSH key, which one can obtain from the file $GNUPGHOME/sshcontrol [1]. So far so good – but how does one know which of the keys listed in that file is the right one, especially if your sshcontrol list is fairly long? Here are the options I am aware of at this point:

Use the key comment. If you remember the contents of the comment field of the SSH key in question you can simply grep for it in all the files stored in $GNUPGHOME/private-keys-v1.d/ . Take the name of the file that matches, strip .key from the end and you’re set! Note that these are binary files so make sure your grep variant does not skip over them.

Use the MD5 fingerprint and the key comment. If for some reason you would rather not do the above you can take advantage of the fact that for SSH keys imported into gpg-agent the normal way, each keygrip line in sshcontrol is preceded by comment lines containing, among other things, the MD5 fingerprint of the imported key. Just tell ssh-add to print MD5 fingerprints for keys known to the agent instead of the default SHA256 ones:

ssh-add -E md5 -l

locate the fingerprint corresponding to the relevant key comment, then find the corresponding keygrip in sshcontrol .

Use the MD5 fingerprint and the public key. A slightly more complex variant of the above can be used if your SSH key pair in question has no comment but you still have the public key lying around. Start by running

ssh-add -L

and note the number of the line in which the public key in question shows up. The output of ssh-add -L and ssh-add -l is in the same order so you should have no trouble locating the corresponding MD5 fingerprint.

Bottom line: use meaningful comments for your SSH keys. It can really simplify key management in the long run.

[1] https://lists.gnupg.org/pipermail/gnupg-users/2007-July/031482.html