cvs2git/parsecvs gentoo-x86 conversion

Recently thanks to the Oregon State University Open Source Labs I’ve been given access to an IBM OpenPower 720. This thing is a beast like no other box which I have access to. The specs are simply amazing. Anyway I noticed that Alec Warner/antarus@gentoo was having problems with running a cvs2git conversion of the gentoo-x86 tree, every box which he attempted it on ran out of memory. I figured ok well I’ve got access to the mothership and should not have any problems doing a run for him. We talked for a little while and he provided me with a quick little script to fire off the conversion process. Well it took 21 hrs consumed 100% of the CPU the entire time and then it failed, towards the end right before it died with an Out of Memory: Killed process 14671 (parsecvs) error. It had consumed 70.1G of virtual memory and 30G RSS as well as all the swap. The gentoo-x86 tree is about 1.4G worth cvs data, the parsecvs util had managed to convert that into 4.1G of git data before it got killed. Gotta say from an admin/infra point of view going from a 1.4G to +4.1G backend repo leaves little room to be desired.

None the less I don’t see us switching to git any time soon unless the backend tools for conversion get a rewrite/update so they can process the full repo as incremental parts or learn how to use the existing memory more efficiently.

In this graph you can see where I started about at 21:00 and ran till about 18:00 the following day, at about 14:00 the basic conversion process was done and parsecvs started allocating memory here pretty quickly for another 4 hours. The final spike is when it started swapping to disk before it got killed.

24h CPU Usage

Unfortunately the snmpd version running on the box does not appear to support 64bit counters so all the memory graphs are/were nil.

In the end I had fun helping him with this, and it really gave the power5 a workout šŸ™‚
Sometime later this week I’ll start setting up ppc64 binrepos, cross compilers etc..

3 thoughts on “cvs2git/parsecvs gentoo-x86 conversion”

  1. I don’t know how cvs2git works, but maybe you should try tailor (http://packages.gentoo.org/packages/?category=dev-util;name=tailor). I think it works in an incremental way, reproducing one patchset at a time (which should be runnable on “normal” machines), allowing for bidirectional sync between repositories. It has a lot of backends, so you can try other VCS instead of git, to see how much space does the same tree take up in each VCS.

    Good luck!

  2. We got Alec Warner access to another power5 and he’s been finishing out his testing on his own. I know one of the tools ‘tailor’ he was testing with was looking like it was going to take the better part of ~30 days to complete.

Comments are closed.