Gentoo on Integricloud

Integricloud gave me access to their infrastructure to track some issues on ppc64 and ppc64le.

Since some of the issues are related to the compilers, I obviously installed Gentoo on it and in the process I started to fix some issues with catalyst to get a working install media, but that’s for another blogpost.

Today I’m just giving a walk-through on how to get a ppc64le (and ppc64 soon) VM up and running.

Preparation

Read this and get your install media available to your instance.

Install Media

I’m using the Gentoo installcd I’m currently refining.

Booting

You have to append console=hvc0 to your boot command, the boot process might figure it out for you on newer install medias (I still have to send patches to update livecd-tools)

Network configuration

You have to manually setup the network.
You can use ifconfig and route or ip as you like, refer to your instance setup for the parameters.

ifconfig enp0s0 ${ip}/16
route add -net default gw ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf
ip a add ${ip}/16 dev enp0s0
ip l set enp0s0 up
ip r add default via ${gw}
echo "nameserver 8.8.8.8" > /etc/resolv.conf

Disk Setup

OpenFirmware seems to like gpt much better:

parted /dev/sda mklabel gpt

You may use fdisk to create:
– a PowerPC PrEP boot partition of 8M
– root partition with the remaining space

Device     Start      End  Sectors Size Type
/dev/sda1   2048    18431    16384   8M PowerPC PReP boot
/dev/sda2  18432 33554654 33536223  16G Linux filesystem

I’m using btrfs and zstd-compress /usr/portage and /usr/src/.

mkfs.btrfs /dev/sda2

Initial setup

It is pretty much the usual.

mount /dev/sda2 /mnt/gentoo
cd /mnt/gentoo
wget https://dev.gentoo.org/~mattst88/ppc-stages/stage3-ppc64le-20180810.tar.xz
tar -xpf stage3-ppc64le-20180810.tar.xz
mount -o bind /dev dev
mount -t devpts devpts dev/pts
mount -t proc proc proc
mount -t sysfs sys sys
cp /etc/resolv.conf etc
chroot .

You just have to emerge grub and gentoo-sources, I diverge from the defconfig by making btrfs builtin.

My /etc/portage/make.conf:

CFLAGS="-O3 -mcpu=power9 -pipe"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult https://wiki.gentoo.org/wiki/Changing_the_CHOST_variable beforee
 changing.
CHOST="powerpc64le-unknown-linux-gnu"

# NOTE: This stage was built with the bindist Use flag enabled
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"

USE="ibm altivec vsx"

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C
ACCEPT_KEYWORDS=~ppc64

MAKEOPTS="-j4 -l6"
EMERGE_DEFAULT_OPTS="--jobs 10 --load-average 6 "

My minimal set of packages I need before booting:

emerge grub gentoo-sources vim btrfs-progs openssh

NOTE: You want to emerge again openssh and make sure bindist is not in your USE.

Kernel & Bootloader

cd /usr/src/linux
make defconfig
make menuconfig # I want btrfs builtin so I can avoid a initrd
make -j 10 all && make install && make modules_install
grub-install /dev/sda1
grub-mkconfig -o /boot/grub/grub.cfg

NOTE: make sure you pass /dev/sda1 otherwise grub will happily assume OpenFirmware knows about btrfs and just point it to your directory.
That’s not the case unfortunately.

Networking

I’m using netifrc and I’m using the eth0-naming-convention.

touch /etc/udev/rules.d/80-net-name-slot.rules
ln -sf /etc/init.d/net.{lo,eth0}
echo -e "config_eth0=\"${ip}/16\"\nroutes_eth0="default via ${gw}\"\ndns_servers_eth0=\"8.8.8.8\"" > /etc/conf.d/net

Password and SSH

Even if the mticlient is quite nice, you would rather use ssh as much as you could.

passwd 
rc-update add sshd default

Finishing touches

Right now sysvinit does not add the hvc0 console as it should due to a profile quirk, for now check /etc/inittab and in case add:

echo 'hvc0:2345:respawn:/sbin/agetty -L 9600 hvc0' >> /etc/inittab

Add your user and add your ssh key and you are ready to use your new system!

lxc, ipv6 and iproute2

Not so recently I got a soyoustart system since it is provided with an option to install Gentoo out of box.

The machine comes with a single ipv4 and a /64 amount of ipv6 addresses.

LXC

I want to use the box to host some of my flask applications (plaid mainly), keep some continuous integration instances for libav and some other experiments with compilers and libraries (such as musl, cparser other).

Since Diego was telling me about lxc I picked it. It is simple, requires not much effort and in Gentoo we have at least some documentation.

Setting up

I followed the documentation provided and it worked quite well up to a point. The btrfs integration works as explained, creating new Gentoo instances just worked, setting up the network… Required some effort.

Network woes

I have just 1 single ipv4 and some ipv6 so why not leveraging them? I decided to partition my /64 and use some, configured the bridge to take ::::1::1 and set up the container configuration like this:

lxc.network.type = veth
lxc.network.link = br0
lxc.network.flags = up
lxc.network.ipv4 = 192.168.1.4/16
lxc.network.ipv4.gateway = auto
lxc.network.ipv6 = ::::1::4/80
lxc.network.ipv6.gateway = auto
lxc.network.hwaddr = 02:00:ee:cb:8a:04

But the route to my container wasn’t advertised.

Having no idea why I just kept poking around sysctl and iproute2 until I got:

  • sysctl.conf:
  net.ipv6.conf.all.forwarding = 1
  net.ipv6.conf.eth0.proxy_ndp = 1

And

ip -6 neigh add proxy ::::1::4 dev eth0

In my container runner script.

I know that at least other people had the problem so here this mini-post.

Code and Conduct

This is a sort of short list of checklists and few ramblings in the wake of Fosdem’s Code of Conduct discussions and the not exactly welcoming statements about how to perceive a Code of Conduct such as this one.

Code of Conduct and OpenSource projects

A Code of Conduct is generally considered a mean to get rid of problematic people (and thus avoid toxic situations). I prefer consider it a mean to welcome people and provide good guidelines to newcomers.

Communities without a code of conduct tend to reject the idea of having one, thinking that it is only needed to solve the above mentioned issue and adding more bureaucracy would just actually give more leeway to macchiavellian ploys.

Sadly, no matter how good the environment is, it takes just few poisonous people to get in an unbearable situation and a you just need one in few selected cases.

If you consider the CoC a shackle or a stick to beat “bad guys” so you do not need it until you see a bad guy, that is naive and utterly wrong: you will end up writing something that excludes people due a, quite understandable but wrong, knee-jerk reaction.

A Code of Conduct should do exactly the opposite, it should embrace people and make easier joining and fit in. It should be the social equivalent of the developer handbook or the coding style guidelines.

As everybody can make a little effort and make sure to send code with spaces between operators everybody can make an effort and not use colorful language. Likewise as people would be more happy to contribute if the codebase they are hacking on is readable so they are more confident in joining the community if the environment is pleasant.

Making an useful Code of Conduct

The Code of Conduct should be a guideline for people that have no idea what the expected behavior is.
It should be written thinking on how to help people get along not on how to punish who does not.

  • It should be short. It is pointless to enumerate ALL the possible way to make people uncomfortable, you are bound to miss few.
  • It should be understanding and inclusive. Always assume cultural biases and not ill will.
  • It should be enforced. It gets quite depressing when you have a 100+ lines code of conduct but then nobody cares about it and nobody really enforces it. And I’m not talking about having specifically designated people to enforce it. Your WHOLE community should agree on what is an acceptable behavior and act accordingly on breaches.

People joining the community should consider the Code of Conduct first as a request (and not a demand) to make an effort to get along with the others.

Pitfalls

Since I saw quite some long and convoluted wall of text being suggested as THE CODE OF CONDUCT everybody MUST ABIDE TO, here some suggestion on what NOT do.

  • It should not be a political statement: this is a strong cultural bias that would make potential contributors just stay away. No matter how good and great you think your ideas are, those are unrelated to a project that should gather all the people that enjoy writing code in their spare time. The Open Source movement is already an ideology in itself, overloading it with more is just a recipe for a disaster.
  • Do not try to make a long list of definitions, you just dilute the content and give even more ammo to lawyer-type arguers.
  • Do not think much about making draconian punishments, this is a community on internet, even nowadays nobody really knows if you are actually a dog or not, you cannot really enforce anything if the other party really wants to be a pest.

Good examples

Some CoC I consider good are obviously the ones used in the communities I belong to, Gentoo and Libav, they are really short and to the point.

Enforcing

As I said before no matter how well written a code of conduct is, the only way to really make it useful is if the community as whole helps new (and not so new) people to get along.

The rule of thumb “if anybody feels uncomfortable in a non-technical discussion, once they say they are, drop it immediately”, is ok as long:

  • The person uncomfortable speaks up. If you are shy you might ask somebody else to speak up for you, but do not be quiet when it happens and then fill a complaint much later, that is NOT OK.
  • The rule is not abused to derail technical discussions. See my post about reviews to at least avoid this pitfall.
  • People agree to drop at least some of their cultural biases, otherwise it would end up like walking on eggshells every moment.

Letting situations going unchecked is probably the main issue, newcomers can think it is OK to behave in a certain way if people are behaving such way and nobody stops that, again, not just specific enforcers of some kind, everybody should behave and tell clearly to those not behaving that they are problematic.

Gentoo is a big community, so gets problematic having a swift reaction: lots of people prefer not to speak up when something happens, so people unwillingly causing the problem are not made aware immediately.

Libav is a much smaller community and in general nobody has qualms in saying “please stop” (that is also partially due how the community evolved).

Hopefully this post would help avoid making some mistakes and help people getting along better.

Reviews

This spurred from some events happening in Gentoo, since with the move to git we eventually have more reviews and obviously comments over patches can be acceptable (and accepted) depending on a number of factors.

This short post is about communicating effectively.

When reviewing patches

No point in pepper coating

Do not disparage code or, even worse, people. There is no point in being insulting, you add noise to the signal:

You are a moron! This is shit has no place here, do not do again something this stupid.

This is not OK: most people will focus on the insult and the technical argument will be totally lost.

Keep in mind that you want people doing stuff for the project not run away crying.

No point in sugar coating

Do not downplay stupid mistakes that would crash your application (or wipe an operating system) because you think it would hurt the feelings of the contributor.

    rm -fR /usr /local/foo

Is as silly as you like but the impact is HUGE.

This is a tiny mistake, you should not do that again.

No, it isn’t tiny it is quite a problem.

Mistakes happen, the review is there to avoid them hitting people, but a modicum of care is needed:
wasting other people’s time is still bad.

Point the mistake directly by quoting the line

And use at most 2-3 lines to explain why it is a problem.
If you can’t better if you fix that part yourself or move the discussion on a more direct media e.g. IRC.

Be specific

This kind of change is not portable, obscures the code and does not fix the overflow issue at hand:
The expression as whole could still overflow.

Hopefully even the most busy person juggling over 5 different tasks will get it.

Be direct

Do not suggest the use of those non-portable functions again anyway.

No room for interpretation, do not do that.

Avoid clashes

If you and another reviewer disagree, move the discussion on another media, there is NO point in spamming
the review system with countless comments.

When receiving reviews (or waiting for them)

Everybody makes mistakes

YOU included, if the reviewer (or more than one) tells you that your changes are not right, there are good odds you are wrong.

Conversely, the reviewer can make mistakes. Usually is better to move away from the review system and discuss over emails or IRC.

Be nice

There is no point in being confrontational. If you think the reviewer is making a mistake, politely point it out.

If the reviewer is not nice, do not use the same tone to fit in. Even more if you do not like that kind of tone to begin with.

Wait before answering

Do not update your patch or write a reply as soon as you get a notification of a review, more changes might be needed and maybe other reviewers have additional opinions.

Be patient

If a patch is unanswered, ping it maybe once a week, possibly rebasing it if the world changed meanwhile.

Keep in mind that most of your interaction is with other people volunteering their free time and not getting anything out of it as well, sometimes the real-life takes priority =)

Again on assert()

Since apparently there are still people not reading the fine man page.

If the macro NDEBUG was defined at the moment was last included, the macro assert() generates no code, and hence does nothing at all.
Otherwise, the macro assert() prints an error message to standard error and terminates the program by calling abort(3) if expression is false (i.e., compares equal to zero).
The purpose of this macro is to help the programmer find bugs in his program. The message “assertion failed in file foo.c, function do_bar(), line 1287” is of no help at all to a user.

I guess it is time to return on security and expand a bit which are good practices and which are misguided ideas that should be eradicated to reduce the amount of Deny Of Service waiting to happen.

Security issues

The term “Security issue” covers a lot of different kind of situations. Usually unhanded paths in the code lead to memory corruption, memory leaks, crashes and other less evident problems such as information leaks.

I’m focusing on crashes today, assume the others are usually more annoying or dangerous, it might be true or not depending on the scenarios:

If you are watching a movie and you have a glitch in the bitstream that makes the application leak some memory you would not care at all as long you can enjoy your movie. If the same glitch makes VLC to close suddenly a second before you get to see who is the mastermind behind a really twisted plot… I guess you’ll scream at whoever thought was a good idea to crash there.

If a glitch might get an attacker to run arbitrary code while you are watching your movie probably you’d like better to have your player to just crash instead.

It is a false dichotomy since what you want is to have the glitch handled properly, and keep watching the rest of the movie w/out having VLC crashing w/out any meaningful information for you to know.

Errors must be handled, trading a crash for something else you consider worse is just being naive.

What is assert exactly?

assert is a debugging facility mandated by POSIX and C89 and C99, it is a macro that more or less looks like this

#define assert()                                       \
    if (condition) {                                   \
        do_nothing();                                  \
    } else {                                           \
       fprintf(stderr, "%s %s", __LINE__, __func__);   \
       abort();                                        \
    }

If the condition does not happen crash, here the real-life version from musl

#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))

How to use it

Assert should be use to verify assumptions. While developing they help you to verify if your
assumptions meet reality. If not they tell you that should investigate because something is
clearly wrong. They are not intended to be used in release builds.
– some wise Federico while talking about another language asserts

Usually when you write some code you might do something like this to make sure you aren’t doing anything wrong, you start with

int my_function_doing_difficult_computations(Structure *s)
{
   a = some_computation(s);
   ....
   b = other_operations(a, s);
   ....
   c = some_input(s, b);
   ...
   idx = some_operation(a, b, c);

   return some_lut[idx];
}

Where idx in a signed integer, and so a, b, c are with some ranges that might or not depend on some external input.

You do not want to have idx to be outside the range of the lookup table array some_lut and you are not so sure. How to check that you aren’t getting outside the array?

When you write the code usually you iteratively improve a prototype, you can add tests to make sure every function is returning values within the expected range and you can use assert() as a poor-man C version of proper unit-testing.

If some function depends on values outside your control (e.g. an input file), you usually do validation over them and cleanly error out there. Leaving external inputs unaccounted or, even worse, put an assert() there is really bad.

Unit testing and assert()

We want to make sure our function works fine, let’s make a really tiny test.

void test_some_computation(void)
{
    Structure *s = NULL;
    int i;
    while (input_generator(&s, i)) {
       int a = some_computation(s);
       assert(a > 0 && a <10);
    }
}

It is compact and you can then run your test under gdb and inspect a bit around. Quite good if you are refactoring the innards of some_computation() and you want to be sure you did not consider some corner case.

Here assert() is quite nice since we can pack in a single line the testcase and have a simple report if something went wrong. We could do better since assert does not tell use the value or how we ended up there though.

You might not be that thorough and you can just decide to put the same assert in your function and check there, assuming you cover all the input space properly using regression tests.

To crash or not to crash

The people that consider OK crashing on runtime (remember the sad user that cannot watch his wonderful movie till the end?) suggest to leave the assert enabled at runtime.

If you consider the example above, would be better to crash than to read a random value from the memory? Again this is a false dichotomy!

You can expect failures, e.g. broken bitstreams and you want to just check and return a proper failure message.

In our case some_input() return value should be checked for failures and the return value forwarder further up till the library user that then will decide what to do.

Now remains the access to the lookup table. If you didn’t check sufficiently the other functions you might get a bogus index and if you get a bogus index you will read from random memory (crashing or not depending if the random memory is on an address mapped to the program outside). Do you want to have an assert() there? Or you’d rather ad another normal check with a normal failure path?

An correct answer is to test your code enough so you do not need to add yet another check and, in fact, if the problem arises is wrong to add a check there, or, even worse an assert(), you should just go up in the execution path and fix the problem where it is: a non validated input, a wrong “optimization” or something sillier.

There is open debate on if having assert() enabled is a good or bad practice when talking about defensive design. In C, in my opinion, it is a complete misuse. You if you want to litter your release code with tons of branches you can also spend time to implement something better and make sure to clean up correctly. Calling abort() leaves your input and output possibly in severely inconsistent state.

How to use it the wrong way

I want to trade a crash anytime the alternative is memory corruption
– some misguided guy

Assume you have something like that

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);


while (some_related_computations(s)) {
   do_stuff_(s, p);
   p += 4;
}

assert(p - buf == size);

If some_computation() and some_related_computation(s) do not agree, you might write over the allocated buffer! The naive person above starts talking about how the memory is corrupted by do_stuff() and horrible things (e.g. foreign code execution) could happen without the assert() and how even calling return at that point is terrible and would lead to horrible horrible things.

Ok, NO. Stop NOW. Go up and look at how assert is implemented. If you check at that point that something went wrong, you have the corruption already. No matter what you do, somebody could exploit it depending on how naive you had been or unlucky.

Remember: assert() does do I/O, allocates memory, raises a signal and calls functions. All that you would rather not do when your memory is corrupted is done by assert().

You can be less naive.

int size = some_computation(s);
uint8_t *p;
uint8_t *buf = p = malloc(size);

while (some_related_computations(s) && size > 4) {
   do_stuff_(s, p);
   p    += 4;
   size -= 4;
}
assert(size != 0);

But then, instead of the assert you can just add

if (size != 0) {
    msg("Something went really wrong!");
    log("The state is %p", s->some_state);
    cleanup(s);
    goto fail;
}

This way when the “impossible” happens the user gets a proper notification and you can recover cleanly and no memory corruption ever happened.

Better than assert

Albeit being easy to use and portable assert() does not provide that much information, there are plenty of tools that can be leveraged to get better reporting.

In Closing

assert() is a really nice debugging tool and it helps a lot to make sure some state remains invariant while refactoring.

Leaving asserts in release code, on the other hand, is quite wrong, it does not give you any additional safety. Please do not buy the fairly tale that assert() saves you from the scary memory corruption issues, it does NOT.

Document your project!

After discussing how to track your bugs and your contributions, let see what we have about documentation

Pain and documentation

An healthy Open Source project needs mainly contributors, contributors are usually your users. You get users if the project is known and useful. (and you do not have parasitic entities syphoning your work abusing git-merge, best luck to io.js and markdown-it to not have this experience, switching name is enough of a pain without it).

In order to gain mindshare, the best thing is making what you do easier to use and that requires documenting what you did! The process is usually boring, time consuming and every time you change something you have to make sure the documentation still matches reality.

In the opensource community we have multiple options on the kind of documentation we produce and how to produce.

Wiki

When you need to keep some structure, but you want to have an easy way to edit it wiki can be a good choice and it can lead to nice results. The information present is usually correct and if enough people keep editing it up to date.

Pros:

  • The wiki is quick to edit and you can have people contribute by just using a browser.
  • The documentation is indexed by the search engines easily
  • It can be restricted to a number of trusted people
Cons

  • The information is detached from the actual code and it could desync easily
  • Even if kept up to date, what applies to the current release is not what your poor user might have
  • Usually keeping versioned content is not that simple

Forum

Even if usually they are noisy forums are a good source of information plenty of time.
Personally I try to move interesting bits to a wiki page when I found something that is not completely transient.

Pros:

  • Usually everything require less developer interaction
  • User can share solutions to their problem effectively
Cons

  • The information can get stale even quicker that what you have in the wiki
  • Since it is mainly user-generate the solutions proposed might be suboptimal
  • Being highly interactive it requires more dedicated people to take care of unruly users

Manuals

There are lots of good toolchain to write full manuals as we have in Gentoo.

The old style xml/docbook tend to have a really steep learning curve, not to mention the even more quirky and wonderful monster such as LaTeX (and the lesser texinfo). ReStructuredText, asciidoc and some flavour of markdown seem to be a better tool for the task if you need speed and get contributors up to speed.

Pros:

  • A proper manual can be easily pinned to a specific release
  • It can be versioned using git
  • Some people still like something they can print and has a proper index
Cons

  • With the old tools it is a pain to start it
  • The learning curve can be still unbearable for most contributors
  • It requires some additional dedication to keep it up to date

What to use and why

Usually for small projects the manual is the README, once it grows usually a wiki is the best place to put notes from multiple people. If you are good at it a manual is a boon for all your users.

Tools to have documentation-in-code such as doxygen or docurium can help a lot if your project is having a single codebase.

If you need to unify a LOT of different information, like we have in Gentoo. The problems usually get much more annoying since you have contents written in multiple markups, living in multiple places and usually moving it from one place to another requires a serious editing effort (like moving from our guidexml to the current semantic wiki).

Markup suggestion

Markdown/CommonMark/Kramdown

I do like a lot CommonMark and I even started to port and extend it to be used in docutils since I find ReStructuredText too confusing for the normal users. The best quality of it is the natural flow, it is most annoying defect is that there are too many parser discrepancies and sometimes implementations disagrees. Still is better to have many good implementation than one subpar in everything (hi texinfo, I hate your toolchain).

Asciidoc

The markup is quite nice (up to a point) and the toolchain is sort of nice even if it feels like a Rube Goldberg machine. To my knowledge there is a single implementation of it and that makes me MUCH wary of using it in new projects.

ReStructuredText

The markup is not as intuitive as Asciidoc, thus quite far from Markdown immediate-use feeling, but it has great toolchain (if you like python) and it gets extended to produce lots of different well formatted documents.
It comes with loads markup features that Markdown core lacks: include directive, table of contents, pluggable generic block and span directives, 3 different flavours of tables.

Good if you can come to terms with its complexity all in all.

What’s next

Hopefully during this year among my many smaller and bigger projects, I’ll find time to put together something nice for documentation as well.

Track your issues

Issue Trackers

If you are not aware of a problem, you cannot fix it.

Having full awareness of the issues and managing it is the key of success for any kind of project (not just software).

For an open-source project it is essential that the issue tracker focuses on at least 3 areas:
Ease of use: You get reports mainly by casual users, they must spend the least amount of time to understand the tool and to provide the information.
Loudness: It must make problems easy to spot.
Data Mining: It should provide tools to query details, aggregate bugs and manipulate them.

What’s available

Right now I tried in different projects many issue trackers, sadly almost none fit the bill, they usually are actually the opposite: limited, cumbersome, hard to configure and horrible to use either to fill bugs or to actually manage them.

Bugzilla

It is by far the least bad, it has plugins to provide near-instant access thanks to Mozilla Persona, it has a rich rpc system that could be leveraged to have irc notifiers or side site statistics, importing-exporting data is almost there. As we know in Gentoo, it requires some deep manipulation and if there is nobody around to do that you can get fallouts like this when a single stubborn (and probably distracted) developer (vapier) manages to spoil the result of the goodwill of another and makes the Project overall more frail.

Mantis

It is still too rich of confusing option but its default splash views are a boon if you are wondering what’s the status of your project. No open-id/persona/single-sign-on integration sadly.

Redmine/Trac

Usually not good enough on the reporting side and, even if they are much simpler than Bugzilla, still not good for the untrained user. They integrate with the source repository view and knowledge base (aka wiki) so they can be a good starting point for small organizations.

Github/GitLab/Gogs

They have a more encompassing approach than redmine and trac, their issue tracker component is too simple in some cases (with Github not having even support for attachments and gogs not really managing tags yet) or a little too rough (no bug dependencies). But, with its immediate UI and the label-oriented approach, it is already pretty good for a large deal of projects. Sadly not Libav: we do need proper attachments.

RT

Request Tracker is overwhelming. No other words. Do not use it if you do not need to. It is too complex to configure on the admin side and is too annoying to use on the developer side. For users the interface is usually a mailbox so you can’t go wrong. Perfect if you have to manage a huge number of paying customer and you want to have detailed billing and other extremely advanced features.

Brimir

New kid of the block, it is quite simple, way too simple. Its mail rendering makes it not really great but is pretty much a nice concept waiting to bloom. (Will it?)

Suggestion welcome

Do you know any better opensource issue tracker? Please comment down =)

Tracking patches

You need good tools to do a good job.

Even the best tool in the hand of a novice is a club.

I’m quite fond in improving the tools I use. And that’s why I started getting involved in Gentoo, Libav, VLC and plenty of other projects.

I already discussed about lldb and asan/valgrind, now my current focus is about patch trackers. In part it is due to the current effort to improve the libav one,

Contributors

Before talking about patches and their tracking I’d digress a little on who produces them. The mythical Contributor: without contributions an opensource project would not exist.

You might have recurring contributions and unique/seldom contributions. Both are quite important.
In general you should make so seldom contributors become recurring contributors.

A recurring contributor can accept to spend some additional time to setup the environment to actually provide its contribution back to the community, a sporadic contributor could be easily put off if the effort required to send his patch is larger than writing the patch itself.

Th project maintainers should make so the life of contributors is as simple as possible.

Patches and Revision Control

Lately most opensource projects saw the light and started to use decentralized source revision control system and thanks to github and many other is the concept of issue pull requests is getting part of our culture and with it comes hopefully a wider acceptance to the fact that the code should be reviewed before it is merged.

Pull Request

In a decentralized development scenario new code is usually developed in topic branches, routinely rebased against the master until the set is ready and then the set of changes (called series or patchset) is reviewed and after some round of fixes eventually merged. Thanks to bitbucket now we have forking, spooning and knifing as part of the jargon.

The review (and merge) step, quite properly, is called knifing (or stabbing): you have to dice, slice and polish the code before merging it.

Reviewing code

During a review bugs are usually spotted as well way to improve are suggested. Patches might be split or merged together and the series reworked and improved a lot.

The process is usually time consuming, even more for an organization made of volunteer: writing code is fun, address issues spotted is not so much, review someone else code is much less even.

Sadly it is a necessary annoyance since otherwise the errors (and horrors) that would slip through would be much bigger and probably much more. If you do not care about code quality and what you are writing is not used by other people you can probably ignore that, if you feel somehow concerned that what you wrote might turn some people life in a sea of pain. (On the other hand some gratitude for such daunting effort is usually welcome).

Pull request management

The old fashioned way to issue a pull request is either poke somebody telling that your branch is ready for merge or just make a set of patches and mail them to whoever is in charge of integrating code to the main branch.

git provides a nifty tool to do that called git send-email and is quite common to send sets of patches (called usually series) to a mailing list. You get feedback by email and you can update the set using the --in-reply-to option and the message id.

Platforms such as github and similar are more web centric and require you to use the web interface to issue and review the request. No additional tools are required beside your git and a browser.

gerrit and reviewboard provide custom scripts to setup ephemeral branches in some staging area then the review process requires a browser again. Every commit gets some tool-specific metadata to ease tracking changes across series revisions. This approach the more setup intensive.

Pro and cons

Mailing list approach

Testing patches from the mailing list is quite simple thanks to git am. And if the reply-to field is used properly updates appear sorted in a good way.

This method is the simplest for the people used to have the email client always open and a console (if they are using a well configured emacs or vim they literally do not move away from the editor).

On the other hand, people using a webmail or using a basic email client might find the approach more cumbersome than a web based one.

If your only method to track contribution is just a mailing list, gets quite easy to forget which is the status of a set. Patches could be neglected and even who wrote them might forget for a long time.

Patchwork approach

Patchwork tracks which patches hit a mailing list and tries to figure out if they are eventually merged automatically.

It is quite basic: it provides an web interface to check the status and provides a mean to just update the patch status. The review must happen in the mailing list and there is no concept of series.

As basic as it is works as a reminder about pending patches but tends to get cluttered easily and keeping it clean requires some effort.

Github approach

The web interface makes much easier spot what is pending and what’s its status, people used to have everything in the browser (chrome and mozilla could be made to work as a decent IDE lately) might like it much better.

Reviewing small series or single patches is usually nicer but the current UIs do not scale for larger (5+) patchsets.

People not living in a browser find quite annoying switch context and it requires additional effort to contribute since you have to register to a website and the process of issuing a patch requires many additional steps while in the email approach just require to type git send-email -1.

Gerrit approach

The gerrit interfaces tend to be richer than the Github counterparts. That can be good or bad since they aren’t as immediate and tend to overwhelm new contributors.

You need to make an additional effort to setup your environment since you need some custom script.

The series are tracked with additional precision, but for all the practical usage is the same as github with the additional bourden for the contributor.

Introducing plaid

Plaid is my attempt to tackle the problem. It is currently unfinished and in dire need of more hands working on it.

It’s basic concept is to be non-intrusive as much as possible, retaining all the pros of the simple git+email workflow like patchwork does.

It provides already additional features such as the ability to manage series of patches and to track updates to it. It sports a view to get a break out of which series require a review and which are pending for a long time waiting for an update.

What’s pending is adding the ability to review it directly in the browser, send the review email for the web to the mailing list and a some more.

Probably I might complete it within the year or next spring, if you like Flask or python contributions are warmly welcome!

PowerPC is back (and little endian)

Yesterday I fixed a PowerPC issue since ages, it is an endianess issue, and it is (funny enough) on the little endian flavour of it.

PowerPC

I have some ties with this architecture since my interest on the architecture (and Altivec/VMX in particular) is what made me start contributing to MPlayer while fixing issue on Gentoo and from there hack on the FFmpeg of the time, meet the VLC people, decide to part ways with Michael Niedermayer and with the other main contributors of FFmpeg create Libav. Quite a loong way back in the time.

Big endian, Little Endian

It is a bit surprising that IBM decided to use little endian (since big endian is MUCH nicer for I/O processing such as networking) but they might have their reasons.

PowerPC traditionally always had been both-endian with the ability to switch on the fly between the two (this made having foreign-endian simulators lightly less annoying to manage), but the main endianess had always been big.

This brings us to a quite interesting problem: Some if not most of the PowerPC code had been written thinking in big-endian. Luckily since most of the code wrote was using C intrinsics (Bless to whoever made the Altivec intrinsics not as terrible as the other ones around) it won’t be that hard to recycle most of the code.

More will follow.

Libav 10.1 released (in Gentoo)

I just committed the ebuild in Portage and noticed that Homebrew already updated its Formula.

I spent some time in Berlin at LinuxTAG manning the VideoLAN booth and feeding people with VLC and Libav chocolate (many thanks to Borgodoro for providing me with their fine goods).

During the weekend we held the VideoLAN association meeting in the SoundCloud office, thanks a lot again for the wonderful venue.

Unmask in Gentoo

I’m slowly getting a tinderbox up with the help of Flameeyes so we can make sure nothing unexpected happens, the new refcounted API for Frames and Packets makes quite compelling updating from 9 even if we’ll keep updating both release branches for the next year and half.

You can help

There are a number of packets that depend on old 0.8 ffmpeg application and it won’t really work with the current avconv nor with the ffmpeg provided by the recent versions since the new option parsing code written by Anton ended up there as well. Most of those application are either fully orphaned or they have patches to work with avconv because the Debian and Ubuntu developers took care of it. Nikoli provided me with a list.

HWAccel 1.2

This week-end I eventually merged hwaccel1.2 and I hope to get the AVResample updates finalized by this week.

There had been some discussion regarding backporting the latter to release 10 since they simplify a bit porting to the new resampling library.

Rotation reporting API

Vittorio was working on a mean to export the rotation matrix from MOV and SEI NALs since a while. The devil is usually in the details and I guess it had been an hell of fun for him. Even more since he switched continent meanwhile.

After the Linus rant about not having the automagic support for rotation, we had some decent pressure applied to get it out so people will be able to enjoy it. Hopefully it will appear by the week end as well.