systemd – Alice in Penguinland

Let’s say that you need to run on your system some sort server software which instead of daemonising, has a command console permanently attached to standard input. Let us also say that said console is the only way for the administrator to interact with the service, including requesting its orderly shutdown – whoever has written it has not implemented any sort of signal handling so sending SIGTERM to the service process causes it to simply drop dead, potentially losing data in the process. And finally, let us say that the server in question is proprietary software so it isn’t really possible for you to fix any of the above in the source code (yes, I am talking about a specific piece of software – which by the way is very much alive and kicking as of late 2020). What do you do?

According to the collective wisdom of World Wide Web, the answer to this question is “use a terminal multiplexer like tmux or screen“, or at the very least a stripped-down variant of same such as dtach. OK, that sort of works – what if you want to run it as a proper system-managed service under e.g. OpenRC? The answer of the Stack Exchange crowd: have your init script invoke the terminal multiplexer. Oooooookay, how about under systemd, which actually prefers services it manages not to daemonise by itself? Nope, still “use a terminal multiplexer”.

What follows is my attempt to run a service like this under systemd more efficiently and elegantly, or at least with no extra dependencies beyond basic Unix shell commands.

Let us have a closer look at what systemd does with standard I/O of processes it spawns. The man page systemd.exec(5) tells us that what happens here is controlled by the directives StandardInput, StandardOutput and StandardError. By default the former is assigned to null while the latter two get piped to the journal, there are however quite a few other options here. According to the documentation, here is what systemd allows us to connect to standard input:

- we are not interested in null (for obvious reasons) or any of the tty options (the whole point of this exercise is to run fully detached from any terminals);
- data would work if we needed to feed some commands to the service when it starts but is useless for triggering a shutdown;
- file looks promising – just point it to a FIFO on the file system and we’re all set – but it doesn’t actually take care of creating the FIFO for us. While we could in theory work around that by invoking mkfifo (and possibly chown if the service is to run as a specific user) in ExecStartPre, let’s see if we can find a better option
- socket “is valid in socket-activated services only” and the corresponding socket unit must “have Accept=yes set”. What we want is the opposite, i.e. for the service to create its socket
- finally, there is fd – which seems to be exactly what we need. According to the documentation all we have to do is write a socket unit creating a FIFO with appropriate ownership and permissions, make it a dependency of our service using the Sockets directive, and assign the corresponding named file descriptor to standard input.

Let’s try it out. To begin with, our socket unit “proprietarycrapd.socket”. Note that I have successfully managed to get this to work using unit templates as well, %i expansion works fine both here and while specifying unit or file-descriptor names in the service unit – but in order to avoid any possible confusion caused by the fact socket-activated services explicitly require being defined with templates, I have based my example on static units:

[Unit]
Description=Command FIFO for proprietarycrapd

[Socket]
ListenFIFO=/run/proprietarycrapd/pcd.control
DirectoryMode=0700
SocketMode=0600
SocketUser=pcd
SocketGroup=pcd
RemoveOnStop=true

Apart from the fact the unit in question has got no [Install] section (which makes sense given we want this socket to only be activated by the corresponding service, not by systemd itself), nothing out of the ordinary here. Note that since we haven’t used the directive FileDescriptorName, systemd will apply default behaviour and give the file descriptor associated with the FIFO the name of the socket unit itself.

And now, our service unit “proprietarycrapd.service”:

[Unit]
Description=proprietarycrap daemon
After=network.target

[Service]
User=pcd
Group=pcd
Sockets=proprietarycrapd.socket
StandardInput=socket
StandardOutput=journal
StandardError=journal
ExecStart=/opt/proprietarycrap/bin/proprietarycrapd
ExecStop=/usr/local/sbin/proprietarycrapd-stop

[Install]
WantedBy=multi-user.target

StandardInput=socket??? Whatever’s happened to StandardInput=fd:proprietarycrapd.socket??? Here is an odd thing. If I use the latter on my system, the service starts fine and gets the FIFO attached to its standard input – but when I try to stop the service the journal shows “Failed to load a named file descriptor: No such file or directory”, the ExecStop command is not run and systemd immediately fires a SIGTERM at the process. No idea why. Anyway, through trial and error I have found out that StandardInput=socket not only works fine in spite of being used in a service that is not socket-activated but actually does exactly what I wanted to achieve – so that is what I have ended up using.

Which brings us to the final topic, the ExecStop command. There are three reasons why I have opted for putting all the commands required to shut the server down in a shell script:

- first and foremost, writing the shutdown command to the FIFO will return right away even if the service takes time to shut down. systemd sends SIGTERM to the unit process as soon as the last ExecStop command has exited so we have to follow the echo with something that waits for the server process to finish (see below)
- systemd does not execute Exec commands in a shell so simply running echo > /run/proprietarycrapd/pcd.control doesn’t work, we would have to wrap the echo call in an explicit invocation of a shell
- between the aforementioned two reasons and the fact the particular service for which I have created these units actually requires several commands in order to execute an orderly shutdown, I have decided that putting all those command in a script file instead of cramming them into the unit would be much cleaner.

The shutdown script itself is mostly unremarkable so I’ll only quote the bit responsible for waiting for the server to actually shut down. At present I am still looking for doing it in blocking fashion without adding more dependencies (wait only works on child processes of the current shell, the server in question does not create any lock files to which I could attach inotifywait, and attaching the latter to the relevant directory in /proc does not work) but in the meantime, the loop

while kill -0 “${MAINPID}” 2> /dev/null; do
sleep 1s
done

keeps the script ticking along until either the process has exited or the script has timed out (see the TimeoutStopSec directive in systemd.service(5)) and systemd has killed both it and the service itself.

Acknowledgements: with many thanks to steelman for having figured out the StandardInput=socket bit in particular and having let me bounce my ideas off him in general.

Tag: systemd

Console-bound systemd services, the right way