MAKEOPTS=”-j${core} +1″ is NOT the best optimization

Many times, when I had to set the make.conf on systems with particular architectures, I had a doubt on which is the best –jobs value.
The handook suggest to have ${core} + 1, but since I’m curious I wanted to test it by myself to be sure this is right.

To make a good test we need a package with a respectable build system that respects the make parallelization and takes at least few minutes to compile. Otherwise with packages that compile in few seconds we are unable to track the effective difference.
kde-base/kdelibs is, in my opinion, perfect.

If you are on architecture which kde-base/kdelibs is unavailable, just switch to another cmake-based package.

Now, download best_makeopts from my overlay. Below an explanation on what the script does and various suggestions.

  • You need to compile the package on a tmpfs filesystem and, I’m assuming you have /tmp mounted as a tmpfs too;
  • You need to have the tarball of the package on a tmpfs because if you have a slow disk, it may takes more time.
  • You need to switch your governor to performance.
  • You need to be sure you don’t have strange EMERGE_DEFAULT_OPTS.
  • You need to add ‘-B’ because we don’t want to include the time of the installation.
  • You need to drop the existent cache before compile.

As you can see, the for will emerge the same package with makeopts from 1 to 10. If you have, for example, a single core machine, just try the for from 1 to 4 is enough.

Please, during the test, don’t use the cpu for other purposes, and if you can, stop all services and make the test from the tty; you will see the time for every merge.

The following is an example on my machine:
-j1 : real 29m56.527s
-j2 : real 15m24.287s
-j3 : real 13m57.370s
-j4 : real 12m48.465s
-j5 : real 12m55.894s
-j6 : real 13m5.421s
-j7 : real 13m13.322s
-j8 : real 13m23.414s
-j9 : real 13m26.657s

The hardware is:
Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz which has 2 CPUs and 4 threads.
After -j4 you can see the regression.

Another example from an Intel Itanium with 4 CPUs.
-j1 : real 4m24.930s
-j2 : real 2m27.854s
-j3 : real 1m47.462s
-j4 : real 1m28.082s
-j5 : real 1m29.497s

I tested this script on ~20 different machines and in the majority of the cases, the best optimization was ${core} or more exactly ${threads} of your CPU.

Conclusion:
From the handbook:

A good choice is the number of CPUs (or CPU cores) in your system plus one, but this guideline isn’t always perfect.

I don’t know who, years ago, suggested in the handbook ${core} + 1 and I don’t want to trigger a flame. I’m just saying, ${core} + 1 is not the best optimization for me and the test confirms the part:“but this guideline isn’t always perfect”

In all cases ${threads} + ${X} is slower than only ${threads}, so don’t use -j20 if you have a dual-core cpu.

Also, I’m not saying to use ${threads}, I’m just saying feel free to make your tests to watch what is the best optimization.

If you have suggestions to improve the functionality of the script or you think that this script is wrong, feel free to comment or leave an email.

This entry was posted in gentoo. Bookmark the permalink.

36 Responses to MAKEOPTS=”-j${core} +1″ is NOT the best optimization

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.