Security fun – what’s security?

Since I eventually had access to a batch of broken samples from Google, I spent the past months volunteering time to fix in Libav the issues uncovered (the whole set is over 3000 samples), you probably noticed by the number of releases.

You can consider “security” issues pretty much any kind of bug:

A segfault is a security issue.

A read/write from not allocated memory is a security issue.

An assert triggered IS a security issue and not a way to fix them.

A memory leak is a security issue and in most cases the worst kind.

Your security concern is not the same as mine!

Libav has a large surface to attack since you have decoders for every kind of multimedia format, it is a library used in many different situations, what’s a security concern for somebody is a nuisance for somebody else.

If VLC breaks on you when you are trying to decode some incomplete movie you got from bittorrent because one 0 or 1 got misinterpreted is not such an issue. If your transcoding pipeline gets stalled due the same movie being uploaded on Youtube, somebody might be screaming at the idiot that forgot to bound-check that array deep into the code.

If some buffer overflow could lead to code execution, most of the people using avconv to mass transcode won’t care that much, the process is fully sandboxed and they expect it, the people making players are mostly afraid of some buffer overflow being exploitable, their users would feel the pain.

So for us, Libav developers, there isn’t a bug more important or least important. We have to fix all of them and possibly fix them correctly once (so if you move from a buffer overflow to an assert, you just trade a possible code execution to a deny of service). That takes time and resources.

The source of all pain

Most of the bugs are naive assumptions and overlooks piling up over the years, the most common are the following

Off by one: You loop over something and you read one element too many
Corner cases: What happens when your frame has dimension 0? What if it is as large as the maximum representable value?
Faulty assumption: If you think that a malloc cannot fail, think again, if you think realloc won’t ever return NULL so you
can forget about the old pointer and just overwrite it, please DO think again. It can happen, even on Linux
Sloppy coding practices: Some bad practices tend to stick and bad patterns such as not forwarding return values will lead to problems later, usually making the process of tracking back to the root issue HARD.

Even if you are writing something non critical such a fire and forget commandline app you should be a little careful, if you plan to write something more involving such a library that could be used in MANY ways by LOTS of people, you MUST be careful.

Tools of the trade

Tracking bugs is usually annoying and time consuming, if they are crash they are at least apparent, memory leaks and faulty read/write may not trigger an apparent crash, making the whole thing more daunting. Luckily there are good tools help you.

Valgrind

The whole toolset is really valuable, massif and memcheck are the best to figure out where the memory went and who’s the fault.

AddressSanitizer

Asan is a boon since it is much faster than memcheck but also a pain since you have to instrument your code by using a certain compiler (clang or gcc-4.8 and later) and certain flags (-fsanitize=address). You can leverage it in gdb so you can inspect memory while debugging. That had been an huge timesaver most of the time. You can in theory do that also on memcheck adding some lines of code, probably I’ll provide snippets later.

drmemory

If your problem is on non-linux and non-mac you cannot use Asan and Valgrind, the new and coming tool to save you is drmemory. It is the youngest of the set and you can see how green it is by the lack of best practices… So no source releases, naive build system and bad version control system. If you try to build it is better to use the latest svn and hope.

Yet if you have to figure out what’s wrong on windows it is a huge boon already. People with time and will could try to help them on fixing their build system and convince them to move to git.

Automation

Never, ever, ever start hunting this kind of bugs w/out automating the most. Currently I have written a consistent number of lines of bash to automatically triage and check the samples, get the code to build in at least 2-3 flavours (clang and gcc with asan, vanilla gcc for valgrind) and eventually generate additional fate targets so I can run make fate-sec -C .gcc-asan and see if something that was fixed broke when we hadn’t look.

In closing

I still have 200 samples to fix and hopefully I’ll rally more people in helping, if you aren’t running routine tests and make sure your projects are at least valgrind clean (the easiest check to do), you should.

If you are writing code that is a little more critical, better if you use all the tools I briefly described and fix what you overlooked.