Inlining -march=native for distcc

-march=native is a gcc flag that enables auto-detection of CPU architecture and properties. Not only it allows you to avoid finding the correct value of -march= but also enables instruction sets that do not fit any standard CPU profile and detects the cache sizes.

Sadly, -march=native itself can’t really work well with distcc. Since the detection is performed when compiling, remote gcc invocations would use the architecture of the distcc host rather than the client. Therefore, the resulting executables would be a mix of different architectures used by distcc.

You may also find -march=native a bit opaque. For example, we had multiple bug reports about LLVM failing to build with -march=atom. However, some of the reporters were using -march=native, so we wasn’t able to immediately identify the duplicates.

In this article, I will guide you shortly on replacing -march=native with expanded compiler flags, for the benefit of distcc compatibility and more explicit build logs.

Obtaining the native flags from gcc

The first step towards replacing -march=native is to determine which flags are enabled by it. Various people suggest multiple ways of obtaining -march=native flags. For example, you can use the following call:

$ gcc -### -march=native -x c -
Using built-in specs.
Target: x86_64-pc-linux-gnu
Thread model: posix
gcc version 4.8.3 (Gentoo 4.8.3 p1.1, pie-0.5.9) 
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.3/cc1 -quiet - "-march=k8-sse3" -mcx16 -msahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -mno-sse4.2 -mno-sse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt --param "l1-cache-size=64" --param "l1-cache-line-size=64" --param "l2-cache-size=512" "-mtune=k8" -quiet -dumpbase - -auxbase - -fstack-protector -o /tmp/cckZDyUR.s

For those more curious, a similar call can be made with -x c++ for the C++ compiler flags. The expanded optimization flags can be found in the cc1 (or cc1plus in case of C++) command line. I have highlighted the relevant flags — usually you’re looking for various -m flags and --params related to caches.

You may also notice -fstack-protector there. This is because nowadays Gentoo enables it by default. If you are using a non-Gentoo distcc host (why would you have a non-Gentoo host in the first place?), you may want to pass it explicitly as well.

You may find the above output a bit oververbose. While this technically isn’t a problem, it clutters the build logs. So, let’s filter it a bit.

Filtering out redundant flags

Most of the -m flags listed above are redundant, being either equivalent to the defaults, or enabled implicitly by -march. For example, on the host providing the example output none of -mno-* flags were actually required, and -msahf was enabled implicitly.

You can safely assume that in Gentoo all -m flags are disabled by default. To find out what flags are implied by the -march, let's look at gcc sources.

$ tar -xf /var/cache/portage/distfiles/gcc-4.8.3.tar.bz2
$ find gcc-4.8.3/gcc/config -name '*.c' -exec grep k8-sse3 {} +
gcc-4.8.3/gcc/config/i386/i386.c:      {"k8-sse3", PROCESSOR_K8, CPU_K8,
gcc-4.8.3/gcc/config/i386/driver-i386.c:	cpu = "k8-sse3";

The first file has what we're looking for. Inside, you can find:

      {"k8-sse3", PROCESSOR_K8, CPU_K8,

So -march=k8-sse3 would enable -mmmx, -m3dnow, -msse and so on. If you compare this list with the output obtained before, you'd notice that the -march option didn't enable any flags that would need to be disabled explicitly, so all -mno-* flags can be omitted. Similarly, -mfxsr is redundant. But -mcx16 and -msahf seem relevant since the former is not listed there at all, and the latter is disabled by default.

After filtering out the unnecessary flags, we can create both distcc- and eye-friendly CFLAGS like:

CFLAGS='-O2 -pipe -march=k8-sse3 -mcx16 -msahf -param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=512'