something new, well not so much

Just a quick update :

– got the ps3 running and well, I must say that its quite cool
– got swamped in many other things and so I’m slowly configuring my network to get the ps3 on, in the process I managed to brick my first access point (a belkin that now I’ll have to hack a bit to recover if possibile…)
– the lscube tasks are proceding nicely, currently we eventually ironed out some bugs due mp3 and the compositing layer, for the ones not following: I’m working on a new streaming server called Feng, currently it streams h264 and mp3 and let you either just point an url to a container and automagically provide a sdp or, more interesting, let you define a special editlist so you can just provide a simple textfile with the files and the start and stop time and have aggregated streams on the fly, pretty interesting if we manage to complete it and then make it usable =)
– I still have to fix B frames support in h264 and then move on improving those framers or eventually implement vorbis and theora, the gst crew beat me at it but I’d like to be at least a close second ^^;
– I hope to cleanup the roundup setup and the site restyle for lscube soonish once the previous task got addressed since I’d like to get more people involved and the current framework still has some rough edges…

– I’ll start probably hacking on the bfin due a course I’m attending, I cannot say I really like the arch since is a bit irregular, still much nicer than x86 (expect ffmpeg patches about it soon^^)

– last but not least I have my laptop eventually back!

that said I guess you may know why I’m not much reactive on bugs (I promise I’ll try at least the blender ebuild and to provide updates to ffmpeg and mplayer ones during the week end) and I less than lively…

PS: Cocoa programming isn’t that nice…

Drive failure, ps3, other news…

It was quite a busy week, we (me and dario) eventually end up gaining full access to our lab and got all the duties our previous mentor and colleagues had at lscube.

We had been at ONU for our first webcast in Geneva. The place was quite nice and the people were absolutely great =). Once we got back in Italy we spend some time preparing the feed for storage and preparing a new release for fenice (did we tell the 1.12 was the last? well not really since felix was missing….) and felix, the live feeder.

Hopefully tomorrow I’ll set up everything for a proper release and then move to fenice-ng or fe.ng and libnemesi, while we were travelling I eventually fixed the h264 packetizer so now fe.ng can stream h264 and mp3 correctly =) Once I get also libnemesi supporting it I’ll do at least as rc snapshot. Dario worked quite hard to improve the scheduler in order not to choke on certain bad behaviours from a certain well known client…

Now I have a big news: I eventually got a ps3 =) It is japanese model sony graciously lent me for a while ^^ Sadly the label on says 100V~3.8A 50/60Hz and that means that I have to get a voltage converter… Luckily Geert pointed me a nice german shop with good prices, http://www.thiecom.de, I hope what I ordered (correcting at the last minute a product mismatch, I hope the order change email reached otherwise I’d get a 110to220 that is exactly the opposite I need…) will arrive next week since I want to check myself the new livecd and maybe complete the step by step docs.

I eventually complete the fbcompose altivectorization in cairo (check the mailing list for the patch) and hopefully now there is enough to start benchmarking it…

Now the sad news: my Alubook hd died, I’m trying to recover as much as possible and then send it for repair (this time I have a full applecare and I’m going to use fully ^^ ), so I won’t as much available as I was before for more or less 2-3 weeks, hopefully.

I guess that’s more or less all.

YAGU – Yet another global update

– Ps3/Cell: I eventually fixed the binutils in order to get it build for spu-elf, I’m about to unbreak better gcc since someone thought NOTE_INSN_EXPECTED_VALUE wouldn’t be of any use, while the spu.md is using it for the cmp instructions… (needless to say my workaround isn’t working that well…), tomorrow hopefully I’ll update the patchset either with the revert of this patch or with a proper solution (hard since I’m not that proficient in gcc internals =/), I’m afraid of glibc…

– Fenice/lscube: trac + git is a no go, trac itself or setuputils on the fedora server that is streaming is from bad to idiotic or maybe it’s just me unproficient, add also that gitweb seems unavailable as fedora and you get an interesting picture, importing is easy, working almost (once everybody figures the commands), so there some uncertain about a full transition.

– Personal life: You can use anything happens to you to improve, still I’m afraid I won’t develop the ability to teleport and/or timeshift in order to improve the situation, anyway I’m unfeeling better.

– University: I’m forced to do something in mono for an exam and obviously I have about 30h to learn enough gtk# to do that, luckly glade is always glade…

One thing there

After getting some sense about memcpy and h264 (ok, my sample was short enough to make relevant some optimizations that apply just on codec init, thus meaningless) I eventually got something in that seems to be relevant enough and tool quite few lines: I enabled prefetch.

It is pretty much a single asm line and in certain cases it meant a 10% of overall decoding time shaved away. Before I tried using altivec prefetch and it didn’t show a great result so I just removed it, 2 days ago I implemented it with the generic instruction and the result was pleasant enough.

If you happen to have non G4 systems please try to benchmark mpegvideo and h264 decoding for me and report results, the commit revision is 6669

Hopefully I’ll try to provide a snapshot for gentoo in this weekend.

ffmpeg, what’s missing?

Ok, the title is misleading on purpose, as you can see from the previous post I got some requests about ffmpeg+ppc (power, cell, plain ppc), in the case of h264 I’m afraid all the useful bits are already vectorized and the little left around will be useful but isn’t really top priority (obviously I’ll try to be on par with i386 optimizations, so weight and loop filter funcs will have their respective in altivec sooner or later). Other codecs have lots missing vectorization wise, say vp{3,5,6} family that many could like/need because of flash embedded videos, or some quick asm bits could be quite useful for our lil embedded ppcs the same way they are already useful and implemented for arm.

My plan for the next week is keep reordering code and put it back in arch specific dirs so it could be implemented in a more agile way (see what I did for the mathops or what I’m about to try for the bitstream read/write functions), hopefully I’ll complete and commit some altivec optimizations like the mdct (even if I should check if in altivec makes the difference or not), the vp idct variant or the h264 latest bits.

I’ll be unavailable for the week end, see you monday =)

oprofiling ffh264

Recently I got some inquiries about h264 and altivec. just testing decode time was disappointing to some user.

I did my test and on my g4 1.6 I got about the double ofthe speed he experienced on his g5 2.4.

time nice –20 ./ffmpeg -i ~ryan/bluesky_HD_CAVLC_JM93_217f.264 -f rawvideo – > /dev/null
real 0m47.685s
user 0m44.304s
sys 0m3.220s

cat /proc/cpuinfo
processor : 0
cpu : ppc970, altivec supported
clock : 2400.000000MHz
revision : 4.0 (pvr 0070 0400)

time nice –20 ./ffmpeg -i /tmp/bluesky_HD_CAVLC_JM93_217f.264 -f
rawvideo – > /dev/null
real 0m25.877s
user 0m23.768s
sys 0m1.904s

cat /proc/cpuinfo
processor : 0
cpu : 7447A, altivec supported
clock : 1666.666000MHz
revision : 0.5 (pvr 8003 0105)

The ffmpeg code is the same, I hadn’t use anything but the stock cflags, same for him.
I was expecting quite a different result, time hunt the slow gear!

I used oprofile

just started and stopped it befor the ffmpeg call, and the asked opreport to compute some statistics about symbols.

an excerpt

CPU: PowerPC G4, speed 1666.67 MHz (estimated)
Counted CYCLES events (Cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
60355 23.2602 libc-2.4.so _wordcopy_fwd_aligned
13572 5.2305 ffmpeg_g put_h264_chroma_mc8_altivec
13417 5.1708 ffmpeg_g filter_mb
11379 4.3853 ffmpeg_g put_h264_qpel16_h_lowpass_altivec
9700 3.7383 ffmpeg_g fill_caches
9332 3.5965 ffmpeg_g hl_decode_mb
8201 3.1606 vmlinux __flush_dcache_icache

Looks like I’ll have to replace something… or start thinking about optimized glibc…
(mine is built targeting my cpu and is pretty recent, I wonder if the G5 isn’t running on an older or generic built glibc…)

libnemesi,libnms, whatever…

Ok, the name isn’t that stable and probably I’m going to shake the api a bit more soon.

Anyway, the rtp/rtsp client library from lscube is eventually getting in shape and now I moved from simple example to something more useful:

audacious now has support for mp3 over rtsp using libnemesi simple/toplevel API

MPlayer is about to get a demuxer using again the simple API (or at least I’m trying to)

– I’m pushing Diego in order to get also Xine nemisified.

So, last calls before the API freeze and the first release. Start playing with it now so you can get the changes you need before the next major version.