Here a bullet list about doing benchmarks
- Reproducibility: your numbers worth nothing if nobody could reproduce them, so you have to give along them a script or a detailed description of what you did.
- Statistics: outliers and other artefacts may screw your results, make sure your script is run enough times
- Indices and values: if you want to proof something you need hard numbers possibly something everybody can understand easily and cannot be misunderstood: cpu cycles, time to complete, memory usage, are quite good, given you aren’t testing on particularly different architectures or systems.
Quite short, isn’t it? Still many people just state their values by inference (like “it ought to do 2x the syscalls thus should be twice slow”), or just tries to benchmark something that isn’t what you want to test (like “glxgears is slower now, mesa is slower at rendering complex scenes”) or have a quite different settings and configurations (think about having your application with a minimal configuration and the same one with a larger one)
Usually the best way to get a meaningful benchmark is preparing a script and give instructions about how to use (like which version of companion software are you using) and then provide the numbers (media with variance if you are inclined) and a summary of the system, this way others can play and try themselves. This is quite useful since the optimization you are working on may be great on gcc4.3 on PowerPC but be problematic on x86 with gcc-2.95.
Other times you want just people to compare something that is _quite_ influenced by the surrounding system or is annoying to setup, in those cases having a full system image is quite a boon, _everything_ could be the same. And given how easy is to use virtualization/emulation software nowadays it just take a bit of bandwidth.