Some random updates

First of all I’m eventually snapshotting a newer ffmpeg, I’ll need some help to get it play nice with all the other applications. The new ffmpeg has lots of improvements but it changes its api slightly so every application should update accordingly, time has passed so I hope upstream caught up with the change.

Once it will be unmasked I’ll hopefully put the next release of feng in portage, currently I’m studying lighttpd internals in order to

  • Have feng using the same lighttpd syntax for configuration
  • Improve its behavior as server

So far I started importing lighttpd datatypes and lemon based parser directly in a separate branch and reshaping a bit feng in order to make it more rational. First thing learnt from lighttpd: keep everything in instance variables.

In the other news my alubook got its fan broken (and the tibook is in the same sorry shape), if you know where to find replacement parts for it please tell me (bonus if they aren’t that pricey).

Random update

I had been busy doing my usual load of random stuff, most not completely gentoo related, some a bit more.

Let’s start with the nicer ones: Marco spent lots of time and eventually it paid off, ffdirac now supports Iframes just fine, it’s quite an important step! As mentor I hadn’t to do much beside watching the evolution of the code and suggesting course of action. In the other news there is a new dirac spec released just today, probably some of the changes are due Marco’s work =)
Today we tried to do some hackery to get git-svn play nice with the braindamage we have on the ffmpeg soc svn. Sadly my side works great, his side not (fetching from svn and pulling to an ffmpeg.git branch works, pushing back to svn not).

About the dirac project I must say that they started with the right frame of mind from day 0, I couldn’t find a group more open to discussion and suggestion, no matter if were things like “It’s wrong to implement dirac in C++, nobody would use it” or “the latex pdf output as you made it is unreadable. I hope to eventually have the time to get texlive working or find something that converts the tex files to docbook and provide a better pdf for them, really I cannot stand reading it for more than 5min… Now I hope this summer of code effort will lead to get a better dirac overall (and that eventually BBC will use it for streaming their fine contents, oh, did I mention that I have a student on my university that should work on getting dirac-rtp a reality? check LScube in the next month)

To sum up, I’m quite happy with this summer of code experience and I thank Marco again for being a great person to work with.

While we are at it some more informations about ffmpeg related efforts, I eventually hacked again a bit on roundup resulting in fixing/workarounding some problems with the email integration, if you happen to have some problems on ffmpeg please give it a try.

Beside that, my work at LScube is still going on, sucking lots of my time… Lately I tried to add more packetizers to feng but w/out much success, looks like my aac implementation is a *bit* wrong, usually relooking at it after a while helps me fixing the issue (as I did for h264) I hope to have it (and many more) completed for the next release. On the client side libnemesi is still waiting for more depacketizers while Alessandro is cleaning up the network stacks, making it less quirky.

Now I could speak of gentoo related stuff, I’m trying to fix some of the programs still using the img_* interface, since it is an annoying task I waited a bit hoping upstream would adapt… No reaction so far so I’m starting with something simple as blender and then hopefully move on other ones. What sucks about the img -> sws move is that sws is less commented, has quite ugly but performant code and it’s a pain to hack on, I started to clean it up but then got sidetracked so there are still some patches waiting completion…

I guess this is a post long enough, probably I’ll add another update tomorrow.

Misc updates again

I spent much of my time trying to get the whole LScube project more alive, so far it’s just a slow start:

I moved the development to git ( https://live.polito.it/gitweb ) and now I’m trying to update the website to a newer drupal and with more documentation. Since the forums are just a spam magnet I guess I’ll nuke them, if you want to contact us just use irc or email =P

I put the efika in use to stress test the streaming server, you can watch

rtsp://130.192.86.166/tc.mov

or

rtsp://130.192.86.166/ed.mov

(both streams are h264+mp3, not many clients could handle that… Yet)

Hopefully a Feng release will appear soon.

That’s for the streaming stuff.

Now, I have a ps3 working and eventually managed to configure and install it, I already found something itchy: git’s ppcsha1+64ul == KaBOOM, I hope it’s just due my test with bleeding edge compilers but I’m afraid not. So far I’m quite impressed by the ps3, just a pity I’m slow in doing something nice there…

More will follow

lu

something new, well not so much

Just a quick update :

– got the ps3 running and well, I must say that its quite cool
– got swamped in many other things and so I’m slowly configuring my network to get the ps3 on, in the process I managed to brick my first access point (a belkin that now I’ll have to hack a bit to recover if possibile…)
– the lscube tasks are proceding nicely, currently we eventually ironed out some bugs due mp3 and the compositing layer, for the ones not following: I’m working on a new streaming server called Feng, currently it streams h264 and mp3 and let you either just point an url to a container and automagically provide a sdp or, more interesting, let you define a special editlist so you can just provide a simple textfile with the files and the start and stop time and have aggregated streams on the fly, pretty interesting if we manage to complete it and then make it usable =)
– I still have to fix B frames support in h264 and then move on improving those framers or eventually implement vorbis and theora, the gst crew beat me at it but I’d like to be at least a close second ^^;
– I hope to cleanup the roundup setup and the site restyle for lscube soonish once the previous task got addressed since I’d like to get more people involved and the current framework still has some rough edges…

– I’ll start probably hacking on the bfin due a course I’m attending, I cannot say I really like the arch since is a bit irregular, still much nicer than x86 (expect ffmpeg patches about it soon^^)

– last but not least I have my laptop eventually back!

that said I guess you may know why I’m not much reactive on bugs (I promise I’ll try at least the blender ebuild and to provide updates to ffmpeg and mplayer ones during the week end) and I less than lively…

PS: Cocoa programming isn’t that nice…

One thing there

After getting some sense about memcpy and h264 (ok, my sample was short enough to make relevant some optimizations that apply just on codec init, thus meaningless) I eventually got something in that seems to be relevant enough and tool quite few lines: I enabled prefetch.

It is pretty much a single asm line and in certain cases it meant a 10% of overall decoding time shaved away. Before I tried using altivec prefetch and it didn’t show a great result so I just removed it, 2 days ago I implemented it with the generic instruction and the result was pleasant enough.

If you happen to have non G4 systems please try to benchmark mpegvideo and h264 decoding for me and report results, the commit revision is 6669

Hopefully I’ll try to provide a snapshot for gentoo in this weekend.

ffmpeg, what’s missing?

Ok, the title is misleading on purpose, as you can see from the previous post I got some requests about ffmpeg+ppc (power, cell, plain ppc), in the case of h264 I’m afraid all the useful bits are already vectorized and the little left around will be useful but isn’t really top priority (obviously I’ll try to be on par with i386 optimizations, so weight and loop filter funcs will have their respective in altivec sooner or later). Other codecs have lots missing vectorization wise, say vp{3,5,6} family that many could like/need because of flash embedded videos, or some quick asm bits could be quite useful for our lil embedded ppcs the same way they are already useful and implemented for arm.

My plan for the next week is keep reordering code and put it back in arch specific dirs so it could be implemented in a more agile way (see what I did for the mathops or what I’m about to try for the bitstream read/write functions), hopefully I’ll complete and commit some altivec optimizations like the mdct (even if I should check if in altivec makes the difference or not), the vp idct variant or the h264 latest bits.

I’ll be unavailable for the week end, see you monday =)

oprofiling ffh264

Recently I got some inquiries about h264 and altivec. just testing decode time was disappointing to some user.

I did my test and on my g4 1.6 I got about the double ofthe speed he experienced on his g5 2.4.

time nice –20 ./ffmpeg -i ~ryan/bluesky_HD_CAVLC_JM93_217f.264 -f rawvideo – > /dev/null
real 0m47.685s
user 0m44.304s
sys 0m3.220s

cat /proc/cpuinfo
processor : 0
cpu : ppc970, altivec supported
clock : 2400.000000MHz
revision : 4.0 (pvr 0070 0400)

time nice –20 ./ffmpeg -i /tmp/bluesky_HD_CAVLC_JM93_217f.264 -f
rawvideo – > /dev/null
real 0m25.877s
user 0m23.768s
sys 0m1.904s

cat /proc/cpuinfo
processor : 0
cpu : 7447A, altivec supported
clock : 1666.666000MHz
revision : 0.5 (pvr 8003 0105)

The ffmpeg code is the same, I hadn’t use anything but the stock cflags, same for him.
I was expecting quite a different result, time hunt the slow gear!

I used oprofile

just started and stopped it befor the ffmpeg call, and the asked opreport to compute some statistics about symbols.

an excerpt

CPU: PowerPC G4, speed 1666.67 MHz (estimated)
Counted CYCLES events (Cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
60355 23.2602 libc-2.4.so _wordcopy_fwd_aligned
13572 5.2305 ffmpeg_g put_h264_chroma_mc8_altivec
13417 5.1708 ffmpeg_g filter_mb
11379 4.3853 ffmpeg_g put_h264_qpel16_h_lowpass_altivec
9700 3.7383 ffmpeg_g fill_caches
9332 3.5965 ffmpeg_g hl_decode_mb
8201 3.1606 vmlinux __flush_dcache_icache

Looks like I’ll have to replace something… or start thinking about optimized glibc…
(mine is built targeting my cpu and is pretty recent, I wonder if the G5 isn’t running on an older or generic built glibc…)