Benchmarking the power5

Yesterday I got fwded along a forum posting over at penguinppc.org from our good friend over at the OSL [cshields@osuosl] powerpc-cpu optimizations

Gentoo PPC64 has glibc-2.3.4.20041102-r2 marked stable so first I started patching it up. While talking to another developer [vapier@gentoo] I found out that the powerpc-cpu optimizations had already been integrated in our glibc-2.4-r3 by him, so I stopped with the patching up of 2.3.x
Time for a few benchmarks. First nbench was not keyworded for PPC64 so I passed that info along to our PPC64 team and [ranger@gentoo] keyworded it for future use.

Here are the results.

Base PPC64 stable.
gcc-3.4.4 glibc-2.3.4.20041102-r2

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          581.52  :      14.91  :       4.90
STRING SORT         :            96.4  :      43.07  :       6.67
BITFIELD            :      1.2933e+08  :      22.19  :       4.63
FP EMULATION        :          36.512  :      17.52  :       4.04
FOURIER             :          8689.8  :       9.88  :       5.55
ASSIGNMENT          :          7.3297  :      27.89  :       7.23
IDEA                :          1503.4  :      22.99  :       6.83
HUFFMAN             :          589.62  :      16.35  :       5.22
NEURAL NET          :          14.411  :      23.15  :       9.74
LU DECOMPOSITION    :           556.8  :      28.85  :      20.83
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 22.153
FLOATING-POINT INDEX: 18.757
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 8 CPU
L2 Cache            : 
OS                  : Linux 2.6.5-7.97-pseries64
C compiler          : 3.4.4
libc                : 
MEMORY INDEX        : 6.069
INTEGER INDEX       : 5.154
FLOATING-POINT INDEX: 10.403
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

glibc-2.4.x requires gcc-4 so I compiled that, then recompiled nbench so we could establish any differences it makes alone.
gcc-4.1.1 with 2.3.4.20041102-r2

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          586.56  :      15.04  :       4.94
STRING SORT         :          102.48  :      45.79  :       7.09
BITFIELD            :      1.3137e+08  :      22.54  :       4.71
FP EMULATION        :          39.625  :      19.01  :       4.39
FOURIER             :            8742  :       9.94  :       5.58
ASSIGNMENT          :          10.188  :      38.77  :      10.06
IDEA                :          1750.9  :      26.78  :       7.95
HUFFMAN             :          775.66  :      21.51  :       6.87
NEURAL NET          :          16.845  :      27.06  :      11.38
LU DECOMPOSITION    :          605.04  :      31.34  :      22.63
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 25.276
FLOATING-POINT INDEX: 20.354
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 8 CPU
L2 Cache            : 
OS                  : Linux 2.6.5-7.97-pseries64
C compiler          : 4.1.1
libc                : 
MEMORY INDEX        : 6.948
INTEGER INDEX       : 5.866
FLOATING-POINT INDEX: 11.289
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

Sadly the kernel version that’s running on the OSL box is a SuSe one without the support needed. I kept getting FATAL: kernel too old while building glibc. Guess I’ll have to save the powerpc-cpu optimizations testing for another day..

cvs2git/parsecvs gentoo-x86 conversion

Recently thanks to the Oregon State University Open Source Labs I’ve been given access to an IBM OpenPower 720. This thing is a beast like no other box which I have access to. The specs are simply amazing. Anyway I noticed that Alec Warner/antarus@gentoo was having problems with running a cvs2git conversion of the gentoo-x86 tree, every box which he attempted it on ran out of memory. I figured ok well I’ve got access to the mothership and should not have any problems doing a run for him. We talked for a little while and he provided me with a quick little script to fire off the conversion process. Well it took 21 hrs consumed 100% of the CPU the entire time and then it failed, towards the end right before it died with an Out of Memory: Killed process 14671 (parsecvs) error. It had consumed 70.1G of virtual memory and 30G RSS as well as all the swap. The gentoo-x86 tree is about 1.4G worth cvs data, the parsecvs util had managed to convert that into 4.1G of git data before it got killed. Gotta say from an admin/infra point of view going from a 1.4G to +4.1G backend repo leaves little room to be desired.

None the less I don’t see us switching to git any time soon unless the backend tools for conversion get a rewrite/update so they can process the full repo as incremental parts or learn how to use the existing memory more efficiently.

In this graph you can see where I started about at 21:00 and ran till about 18:00 the following day, at about 14:00 the basic conversion process was done and parsecvs started allocating memory here pretty quickly for another 4 hours. The final spike is when it started swapping to disk before it got killed.

24h CPU Usage

Unfortunately the snmpd version running on the box does not appear to support 64bit counters so all the memory graphs are/were nil.

In the end I had fun helping him with this, and it really gave the power5 a workout 🙂
Sometime later this week I’ll start setting up ppc64 binrepos, cross compilers etc..