ffmpeg, what’s missing?

Ok, the title is misleading on purpose, as you can see from the previous post I got some requests about ffmpeg+ppc (power, cell, plain ppc), in the case of h264 I’m afraid all the useful bits are already vectorized and the little left around will be useful but isn’t really top priority (obviously I’ll try to be on par with i386 optimizations, so weight and loop filter funcs will have their respective in altivec sooner or later). Other codecs have lots missing vectorization wise, say vp{3,5,6} family that many could like/need because of flash embedded videos, or some quick asm bits could be quite useful for our lil embedded ppcs the same way they are already useful and implemented for arm.

My plan for the next week is keep reordering code and put it back in arch specific dirs so it could be implemented in a more agile way (see what I did for the mathops or what I’m about to try for the bitstream read/write functions), hopefully I’ll complete and commit some altivec optimizations like the mdct (even if I should check if in altivec makes the difference or not), the vp idct variant or the h264 latest bits.

I’ll be unavailable for the week end, see you monday =)

oprofiling ffh264

Recently I got some inquiries about h264 and altivec. just testing decode time was disappointing to some user.

I did my test and on my g4 1.6 I got about the double ofthe speed he experienced on his g5 2.4.

time nice –20 ./ffmpeg -i ~ryan/bluesky_HD_CAVLC_JM93_217f.264 -f rawvideo – > /dev/null
real 0m47.685s
user 0m44.304s
sys 0m3.220s

cat /proc/cpuinfo
processor : 0
cpu : ppc970, altivec supported
clock : 2400.000000MHz
revision : 4.0 (pvr 0070 0400)

time nice –20 ./ffmpeg -i /tmp/bluesky_HD_CAVLC_JM93_217f.264 -f
rawvideo – > /dev/null
real 0m25.877s
user 0m23.768s
sys 0m1.904s

cat /proc/cpuinfo
processor : 0
cpu : 7447A, altivec supported
clock : 1666.666000MHz
revision : 0.5 (pvr 8003 0105)

The ffmpeg code is the same, I hadn’t use anything but the stock cflags, same for him.
I was expecting quite a different result, time hunt the slow gear!

I used oprofile

just started and stopped it befor the ffmpeg call, and the asked opreport to compute some statistics about symbols.

an excerpt

CPU: PowerPC G4, speed 1666.67 MHz (estimated)
Counted CYCLES events (Cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
60355 23.2602 libc-2.4.so _wordcopy_fwd_aligned
13572 5.2305 ffmpeg_g put_h264_chroma_mc8_altivec
13417 5.1708 ffmpeg_g filter_mb
11379 4.3853 ffmpeg_g put_h264_qpel16_h_lowpass_altivec
9700 3.7383 ffmpeg_g fill_caches
9332 3.5965 ffmpeg_g hl_decode_mb
8201 3.1606 vmlinux __flush_dcache_icache

Looks like I’ll have to replace something… or start thinking about optimized glibc…
(mine is built targeting my cpu and is pretty recent, I wonder if the G5 isn’t running on an older or generic built glibc…)

libnemesi,libnms, whatever…

Ok, the name isn’t that stable and probably I’m going to shake the api a bit more soon.

Anyway, the rtp/rtsp client library from lscube is eventually getting in shape and now I moved from simple example to something more useful:

audacious now has support for mp3 over rtsp using libnemesi simple/toplevel API

MPlayer is about to get a demuxer using again the simple API (or at least I’m trying to)

– I’m pushing Diego in order to get also Xine nemisified.

So, last calls before the API freeze and the first release. Start playing with it now so you can get the changes you need before the next major version.

My father is about to show some of his best pictures, I doubt many of you outside Italy would be interested that much, still it is important enough for me to tell you ^^.

“Il ritrovamento di una raccolta di negativi scattati al Balon tra fine
anni

CMake… why?

I just did some comparisons about cmake and something better like… autotools?

Step 0:

Some numbers just for fun:

nemesi lu_zero # qlop -tH cmake
cmake: 13 minutes, 35 seconds for 1 merges
nemesi lu_zero # qlop -tH make
make: 1 minute, 0 seconds for 2 merges
nemesi lu_zero # qlop -tH bash
bash: 2 minutes, 32 seconds for 3 merges
nemesi lu_zero # qlop -tH perl
perl: 7 minutes, 52 seconds for 4 merges
nemesi lu_zero # qlop -tH m4
m4: 32 seconds for 1 merges
nemesi lu_zero # qlop -tH automake
automake: 1 minute, 8 seconds for 7 merges
nemesi lu_zero # qlop -tH autoconf
autoconf: 30 seconds for 4 merges

Ok cmake is a ugly hog at least while building.

Step 1:
What you want from your build system:

– take as little resources as possible
– have it as much portable as possible
– have as little as possible to write
– have a good way to express conditionals
– let you have your program correctly built on a large number of arches.
– automagically spot dependencies
– let you use tools like distcc and ccache in an effective way

Step 2:

Uglyness scale before cmake:

imake, scons: bad ideas with a bad implementation, everybody knows that, somebody is workarounding scons a lot, I hope for a fork.

qmake: not sure which part is bad, nobody likes to use it that much for a reason or another.

plain make: simple and to the point, it may not scale that well.

autotools: most people misuse them, many despise certain choices. The lack of good documentation and the version incompatibilities made them in a bad light, things are getting better little by little. The most used system despite the ugly Makefiles produced. I’d use them with their reference at hand.

kbuild: it is wonderful, works really well once you lay it out, it is even simple! Problem: you don’t have a stand alone template to fill so it is much an hacking over existing deployments.

Step 3:

put cmake in the scale:

1- takes forever to build
2- is compact
3- uses the same approach as aclocal m4 files (call them Modules/*.cmake) but has a risible number of them.
4- supports less platforms than autotools
5- the syntax is a brain damage between imake and autotools
6- Hype points because kde4 could use it

Step 4:

When it could have sense:

1- your project is targeting what cmake supports already
2- you like its brain damages more than others
3- you misused autotools and now you are trying to misuse another build system (hopefully it won’t make you misuse it that much).
4- you absolutely love the ability to have project files for some commonly used proprietary ides out of the cmake native files.
5- you want to gain some hype points to your project.

Summary:

cmake is a wonderful technology that should take the way of the dodo and never touch your hard disk, gives nothing more than a lobotomized autotools (and that is good since you can commit less mistakes) has the good mix of hype points and laziness compliant features.

In short cmake developers feel free to ignore me, this is a rant, I hadn’t wrote a build system that will be used by a large project and probably I’m not so entitled to write such large stain of venom.

You obviously can take it seriously and prove me wrong with a next version with 1/2 the build time, less overhead and a nicer syntax.

UPDATE:

flameeyes told me that cmake doesn’t have a way to set cflags/ldflags/asflags on a per target basis. So it is a wonderful regression to pile up to the rest…

UPDATE2:

http://dev.gentoo.org/~lu_zero/development/cdrkit-1.0pre3.tar.bz2
I played a bit with autotools to see how bad they are and/or how good cmake is…
I started from scratch for the first time using autotools…
There are some minor changes but nothing interesting.