After getting some sense about memcpy and h264 (ok, my sample was short enough to make relevant some optimizations that apply just on codec init, thus meaningless) I eventually got something in that seems to be relevant enough and tool quite few lines: I enabled prefetch.
It is pretty much a single asm line and in certain cases it meant a 10% of overall decoding time shaved away. Before I tried using altivec prefetch and it didn’t show a great result so I just removed it, 2 days ago I implemented it with the generic instruction and the result was pleasant enough.
If you happen to have non G4 systems please try to benchmark mpegvideo and h264 decoding for me and report results, the commit revision is 6669
Hopefully I’ll try to provide a snapshot for gentoo in this weekend.