How Debuggers Work: Getting and Setting x86 Registers, Part 2: XSAVE

In the previous part of this article, I have described the basic methods of getting and setting the baseline registers of 32-bit and 64-bit x86 CPUs. I have covered General Purpose Registers, baseline Floating-Point Registers and Debug Registers along with their ptrace(2) interface.

In the second part, I would like to discuss the XSAVE family of instructions. I will describe the different variants of this instruction as well as explain the differences between them and their limitations. Afterwards, I will compare the ptrace(2) API used to access its data on Linux, FreeBSD and NetBSD. Other systems such as OpenBSD or DragonFly BSD do not provide requests to retrieve or set extended registers, so the comparison may help them design their own APIs.

Continue reading

How Debuggers Work: Getting and Setting x86 Registers, Part 1

In this article, I would like to shortly describe the methods used to dump and restore the different kinds of registers on 32-bit and 64-bit x86 CPUs. The first part will focus on General Purpose Registers, Debug Registers and Floating-Point Registers up to the XMM registers provided by the SSE extension. I will explain how their values can be obtained via the ptrace(2) interface.

The ptrace(2) API is commonly used in all modern BSD systems and Linux, as all of them derive it from the original form designed and implemented in 4.3BSD. The primary focus in this article is on the FreeBSD and NetBSD systems. Nevertheless, the users of other Operating Systems such as OpenBSD, DragonFly BSD or Linux can still benefit from this article as the basic principles are the same and the code examples are intended to be easily adapted to other platforms.

A single CPU (in modern hardware: CPU core or CPU thread, if hyperthreading is available) can execute only one program thread at a time. In order to be able to run multiple processes and threads quasi-simultaneously, the Operating System must perform context switching — that is periodically suspend the currently running thread, save its state, restore the saved state of another thread and resume it. Saving and restoring the values of the processor’s registers play an important part in context switching. It is important that this process is fully transparent to the process being switched, and in a properly implemented kernel there should be no side effects that are perceptible to the program.

Continue reading

DISTUTILS_USE_SETUPTOOLS, QA spam and… more QA spam?

Update: the information provided in this post is out of date. As of today, Python 3.7 is no longer relevant from DISTUTILS_USE_SETUPTOOLS perspective, and ‘rdepend’ is no longer valid when when entry points are used.

I suppose that most of the Gentoo developers have seen at least one of the ‘uses a probably incorrect DISTUTILS_USE_SETUPTOOLS value’ bugs by now. Over 350 have been filed so far, and new ones are filed practically daily. The truth is, I’ve never intended for this QA check to result in bugs being filed against packages, and certainly not that many bugs.

This is not an important problem to be fixed immediately. The vast majority of Python packages depend on setuptools at build time (this is why the build-time dependency is the eclass’ default), and being able to unmerge setuptools is not a likely scenario. The underlying idea was that the QA check would make it easier to update DISTUTILS_USE_SETUPTOOLS when bumping packages.

Nobody has asked me for my opinion, and now we have hundreds of bugs that are not very helpful. In fact, the effort involved in going through all the bugmail, updating packages and closing the bugs greatly exceeds the negligible gain. Nevertheless, some people actually did it. I have bad news for them: setuptools upstream has changed entry point mechanism, and most of the values will have to change again. Let me elaborate on that.
Continue reading “DISTUTILS_USE_SETUPTOOLS, QA spam and… more QA spam?”

Speeding up emerge depgraph calculation using PyPy3

WARNING: Some of the respondents were< not able to reproduce my results. It is possible that this dependent on the hardware or even a specific emerge state. Please do not rely on my claims that PyPy3 runs faster, and verify it on your system before switching permanently.

If you used Gentoo for some time, you’ve probably noticed that emerge is getting slower and slower. Before I switched to SSD, my emerge could take even 10 minutes before it figured out what to do! Even now it’s pretty normal for the dependency calculation to take 2 minutes. Georgy Yakovlev recently tested PyPy3 on PPC64, and noticed a great speedup, apparently due to very poor optimization of CPython on that platform. I’ve attempted the same on amd64, and measured a 35% speedup nevertheless.
Continue reading “Speeding up emerge depgraph calculation using PyPy3”

Sleeping and waking up

Time to write something more personal, for a change. I find it somewhat curious how my sleeping habits have changed over the years, as well as the level of sophistication of the way I have been waking up. Let me run a short recollection of how a teenager who tried to squeeze every single minute out of (morning) sleep turned into a young man who tried to optimize his sleep, and finally into a man who does not mind waking up much earlier than strictly necessary.
Continue reading “Sleeping and waking up”