Randomness in virtual machines

I always felt that entropy available to the operating system must be affected by running said operating system in a virtual environment – after all, unpredictable phenomena used to feed the entropy pool are commonly based on hardware and in a VM most hardware either is simulated or has the hypervisor mediate access to it. While looking for something tangentially related to the subject, I have recently stumbled upon a paper commissioned by the German Federal Office for Information Security which covers this subject, with particular emphasis on entropy sources used by the standard Linux random-number generator (i.e. what feeds /dev/random and /dev/urandom), in extreme detail:

https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/ZufallinVMS/Randomness-in-VMs.pdf?__blob=publicationFile&v=3

Personally I have found this paper very interesting but since not everyone can stomach a 142-page technical document, here are some of the main points made by its authors regarding entropy.

  1. As a reminder – the RNG in recent versions of Linux uses five sources of random noise, three of which do not require dedicated hardware (emulated or otherwise): Human-Interface Devices, rotational block devices, and interrupts.
  2. Running in a virtual machine does not affect entropy sources implemented purely in hardware or purely in software. However, it does affect hybrid sources – which in case of Linux essentially means all of the default ones.
  3. Surprisingly enough, virtualisation seems to have no negative effect on the quality of delivered entropy. This is at least in part due to additional CPU execution-time jitter introduced by the hypervisor compensating for increased predictability of certain emulated devices.
  4. On the other hand, virtualisation can strongly affect the quantity of produced entropy. More on this below.
  5. Low quantity of available entropy means among other things that it takes virtual machines visibly longer to bring /dev/urandom to usable state. This is a problem if /dev/urandom is used by services started at boot time because they can be initialised using low-quality random data.

Why exactly is the quantity of entropy low in virtual machines? The problem is that in a lot of configurations, only the last of the three standard noise sources will be active. On the one hand, even physical servers tend to be fairly inactive on the HID front. On the other, the block-device source does nothing unless directly backed by a rotational device – which has been becoming less and less likely, especially when we talk about large cloud providers who, chances are, hold your persistent storage on distributed networked file systems which are miles away from actual hard drives. This leaves interrupts as the only available noise source. Now take a machine configured this way and have it run a VPN endpoint, a HTTPS server, a DNSSEC-enabled DNS server… You get the idea.

But wait, it gets worse. Chances are many devices in your VM, especially ones like network cards which are expected to be particularly active and therefore the main source of interrupts, are in fact paravirtualised rather than fully emulated. Will such devices still generate enough interrupts? That depends on the underlying hypervisor, or to be precise on the paravirtualisation framework it uses. The BSI paper discusses this for KVM/QEMU, Oracle VirtualBox, Microsoft Hyper-V, and VMWare ESXi:

  • the former two use the the VirtIO framework, which is integrated with the Linux interrupt-handling code;
  • VMWare drivers trigger the interrupt handler similarly to physical devices;
  • Hyper-V drivers use a dedicated communication channel called VMBus which does not invoke the LRNG interrupt handler. This means it is entirely feasible for a Linux Hyper-V guest to have all noise sources disabled, and reenabling the interrupt one comes at a cost of performance loss caused by the use of emulated rather than paravirtualised devices.

 

All in all, the paper in question has surprised me with the news of unchanged quality of entropy in VMs and confirmed my suspicions regarding its quantity. It also briefly mentions (which is how I have ended up finding it) the way I typically work around this problem, i.e. by employing VirtIO-RNG – a paravirtualised hardware RNG in the KVM/VirtualBox guests which which interfaces with a source of entropy (typically /dev/random, although other options are possible too) on the host. Combine that with haveged on the host, sprinkle a bit of rate limiting on top (I typically set it to 1 kB/s per guest, even though in theory haveged is capable of acquiring entropy at a rate several orders of magnitude higher) and chances are you will never have to worry about lack of entropy in your virtual machines again.

Gentoo Linux in a Docker container

I have been using Docker for ebuild development for quite a while and absolutely love it, mostly because how easy it is to manipulate filesystem state with it. Work on several separate ebuilds in parallel? Just spin up several containers. Clean up once I’m done? Happens automatically when I close the container. Come back to something later? One docker commit invocation and I’m done. I could of course do something similar with virtual machines (and indeed I have to for cross-platform work) – but for native amd64 is is extremely convenient.

There is, however, one catch. By default processes running in a Docker container are fairly restricted privilege-wise and the Gentoo sandbox uses ptrace(). Result? By default, certain ebuilds (sys-libs/glibc and dev-libs/gobject-introspection , to name just two) will fail to emerge. One can of course set FEATURES=”-sandbox -usersandbox” for such ebuilds but it is an absolute no-no for both new ebuilds and any stabilisation work.

In the past working around this issue required messing with Docker security policies, which at least I found rather awkward. Fortunately since version 1.13.0 there has been a considerably easier way – simply pass

--cap-add=SYS_PTRACE

to docker-run. Done! Sandbox can now use ptrace() to its heart’s content.

Big Fat Warning: The reason why by default Docker restricts CAP_SYS_PTRACE is that a malicious program can use ptrace() to break out of the container it runs in. Do not grant this capability to containers unless you know what you are doing. Seriously.

Unfortunately the above is not the end of the story because at least as of version 1.13.0, Docker does not allow to enhance the capabilities of a docker-build job. Why is this a problem? For my own work I use a custom image which extends somewhat the official gentoo/stage3-amd64-hardened . One of the things my Dockerfile does is rsync the Portage tree and update @world so that my image contains a fully up-to-date stage3 even when the official base image does not. You can guess what happens when Docker tries to emerge an ebuild requiring the sandbox to use ptrace()… and remember, one of the packages containing such ebuilds is sys-libs/glibc . To my current knowledge the only way around this is to spin up a ptrace-enabled container using the latest good intermediate image left behind by docker-build and execute the remaining build steps manually. Not fun… Hope they will fix this some day.

 

Changing the passphrase for SSH keys in gpg-agent

Possibly the simplest way of changing the passhprase protecting a SSH key imported into gpg-agent is to use the Assuan passwd command:

echo passwd foo | gpg-connect-agent

where foo is the keygrip of your SSH key, which one can obtain from the file $GNUPGHOME/sshcontrol [1]. So far so good – but how does one know which of the keys listed in that file is the right one, especially if your sshcontrol list is fairly long? Here are the options I am aware of at this point:

Use the key comment. If you remember the contents of the comment field of the SSH key in question you can simply grep for it in all the files stored in $GNUPGHOME/private-keys-v1.d/ . Take the name of the file that matches, strip .key from the end and you’re set! Note that these are binary files so make sure your grep variant does not skip over them.

Use the MD5 fingerprint and the key comment. If for some reason you would rather not do the above you can take advantage of the fact that for SSH keys imported into gpg-agent the normal way, each keygrip line in sshcontrol is preceded by comment lines containing, among other things, the MD5 fingerprint of the imported key. Just tell ssh-add to print MD5 fingerprints for keys known to the agent instead of the default SHA256 ones:

ssh-add -E md5 -l

locate the fingerprint corresponding to the relevant key comment, then find the corresponding keygrip in sshcontrol .

Use the MD5 fingerprint and the public key. A slightly more complex variant of the above can be used if your SSH key pair in question has no comment but you still have the public key lying around. Start by running

ssh-add -L

and note the number of the line in which the public key in question shows up. The output of ssh-add -L and ssh-add -l is in the same order so you should have no trouble locating the corresponding MD5 fingerprint.

Bottom line: use meaningful comments for your SSH keys. It can really simplify key management in the long run.

[1] https://lists.gnupg.org/pipermail/gnupg-users/2007-July/031482.html