Alive, hopefully

You may wonder what I’m doing since has been a long time since the latest blog item, well I was busy trying to do too many thing, searching, traveling and so on.

Here a summary:

– I eventually released feng as you can see on
That involved getting the website up, writing lots of documentation (that hopefully someone will read), hacking the code to be in the right shape and making the whole bundle bearable for people with less understanding of autotools and dependencies… I hope the first release isn’t that ugly and I thank dario and alessandro for their help =)

– The ffmpeg bug tracker is taking shape eventually, hacking roundup isn’t the simplest thing in the world
mostly because examples and alternate templates aren’t available; the documentation saves the day most of the times anyway. you may see it on

– On the cell side I started hacking a bit the build system in order to have it working for me (using gentoo, standard paths and stock gcc toolchain) and for the ones that are using the IBM sdk/fedora (bogus paths, shortened prefixes) I hope the people in charge of deciding what would be the standard for writing and running spu code would provide a sane default. Hopefully one I’ll have more time I’ll start writing something on my own, so far I’m just testing pathes and contributions by others ^^;

– the vorbis and theora rfc are proceedings and currently feng and gst are interoperable, I hope to complete the standardization and move to something else, it’s taking too much!

– my altivec work on cairo is still on hold, I hope to get enough time to push an update (since the ibm/sony mathlib has an implementation of vector integer division I could rip it and add some more vector ops in pixman).

– the SoC with ffmpeg has already started, so far I’m receiving some good feedbacks from my student and I’m trying to find the time to reread the dirac spec in order to follow him better.

That’s more or less all, the keyword of the whole document is TIME, lately the lscube involvement took a bit too much mostly because you cannot manage the time well if you have your plans spoiled every by unexpected priorities appearing out of the blue.

Drive failure, ps3, other news…

It was quite a busy week, we (me and dario) eventually end up gaining full access to our lab and got all the duties our previous mentor and colleagues had at lscube.

We had been at ONU for our first webcast in Geneva. The place was quite nice and the people were absolutely great =). Once we got back in Italy we spend some time preparing the feed for storage and preparing a new release for fenice (did we tell the 1.12 was the last? well not really since felix was missing….) and felix, the live feeder.

Hopefully tomorrow I’ll set up everything for a proper release and then move to fenice-ng or and libnemesi, while we were travelling I eventually fixed the h264 packetizer so now can stream h264 and mp3 correctly =) Once I get also libnemesi supporting it I’ll do at least as rc snapshot. Dario worked quite hard to improve the scheduler in order not to choke on certain bad behaviours from a certain well known client…

Now I have a big news: I eventually got a ps3 =) It is japanese model sony graciously lent me for a while ^^ Sadly the label on says 100V~3.8A 50/60Hz and that means that I have to get a voltage converter… Luckily Geert pointed me a nice german shop with good prices,, I hope what I ordered (correcting at the last minute a product mismatch, I hope the order change email reached otherwise I’d get a 110to220 that is exactly the opposite I need…) will arrive next week since I want to check myself the new livecd and maybe complete the step by step docs.

I eventually complete the fbcompose altivectorization in cairo (check the mailing list for the patch) and hopefully now there is enough to start benchmarking it…

Now the sad news: my Alubook hd died, I’m trying to recover as much as possible and then send it for repair (this time I have a full applecare and I’m going to use fully ^^ ), so I won’t as much available as I was before for more or less 2-3 weeks, hopefully.

I guess that’s more or less all.

One thing there

After getting some sense about memcpy and h264 (ok, my sample was short enough to make relevant some optimizations that apply just on codec init, thus meaningless) I eventually got something in that seems to be relevant enough and tool quite few lines: I enabled prefetch.

It is pretty much a single asm line and in certain cases it meant a 10% of overall decoding time shaved away. Before I tried using altivec prefetch and it didn’t show a great result so I just removed it, 2 days ago I implemented it with the generic instruction and the result was pleasant enough.

If you happen to have non G4 systems please try to benchmark mpegvideo and h264 decoding for me and report results, the commit revision is 6669

Hopefully I’ll try to provide a snapshot for gentoo in this weekend.

ffmpeg, what’s missing?

Ok, the title is misleading on purpose, as you can see from the previous post I got some requests about ffmpeg+ppc (power, cell, plain ppc), in the case of h264 I’m afraid all the useful bits are already vectorized and the little left around will be useful but isn’t really top priority (obviously I’ll try to be on par with i386 optimizations, so weight and loop filter funcs will have their respective in altivec sooner or later). Other codecs have lots missing vectorization wise, say vp{3,5,6} family that many could like/need because of flash embedded videos, or some quick asm bits could be quite useful for our lil embedded ppcs the same way they are already useful and implemented for arm.

My plan for the next week is keep reordering code and put it back in arch specific dirs so it could be implemented in a more agile way (see what I did for the mathops or what I’m about to try for the bitstream read/write functions), hopefully I’ll complete and commit some altivec optimizations like the mdct (even if I should check if in altivec makes the difference or not), the vp idct variant or the h264 latest bits.

I’ll be unavailable for the week end, see you monday =)

oprofiling ffh264

Recently I got some inquiries about h264 and altivec. just testing decode time was disappointing to some user.

I did my test and on my g4 1.6 I got about the double ofthe speed he experienced on his g5 2.4.

time nice –20 ./ffmpeg -i ~ryan/bluesky_HD_CAVLC_JM93_217f.264 -f rawvideo – > /dev/null
real 0m47.685s
user 0m44.304s
sys 0m3.220s

cat /proc/cpuinfo
processor : 0
cpu : ppc970, altivec supported
clock : 2400.000000MHz
revision : 4.0 (pvr 0070 0400)

time nice –20 ./ffmpeg -i /tmp/bluesky_HD_CAVLC_JM93_217f.264 -f
rawvideo – > /dev/null
real 0m25.877s
user 0m23.768s
sys 0m1.904s

cat /proc/cpuinfo
processor : 0
cpu : 7447A, altivec supported
clock : 1666.666000MHz
revision : 0.5 (pvr 8003 0105)

The ffmpeg code is the same, I hadn’t use anything but the stock cflags, same for him.
I was expecting quite a different result, time hunt the slow gear!

I used oprofile

just started and stopped it befor the ffmpeg call, and the asked opreport to compute some statistics about symbols.

an excerpt

CPU: PowerPC G4, speed 1666.67 MHz (estimated)
Counted CYCLES events (Cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
60355 23.2602 _wordcopy_fwd_aligned
13572 5.2305 ffmpeg_g put_h264_chroma_mc8_altivec
13417 5.1708 ffmpeg_g filter_mb
11379 4.3853 ffmpeg_g put_h264_qpel16_h_lowpass_altivec
9700 3.7383 ffmpeg_g fill_caches
9332 3.5965 ffmpeg_g hl_decode_mb
8201 3.1606 vmlinux __flush_dcache_icache

Looks like I’ll have to replace something… or start thinking about optimized glibc…
(mine is built targeting my cpu and is pretty recent, I wonder if the G5 isn’t running on an older or generic built glibc…)

Poking in Mesa…

Yesterday I had a look at the curent mesa sources…

And I found out that x86 and amd64 had plenty of optimized code, mostly hand made assembly!
What about ppc you’d ask? Well, there is an empty dir with the code to know if the cpu has altivec or not =_=…

I should study for my last exams so I won’t do much about it in the short time =/