Building crates so they look like C(ABI) Libraries

I presented cargo-c at the rustlab 2019, here is a longer followup of this.

Mixing Rust and C

One of the best selling point for rust is being highly interoperable with the C-ABI, in addition to safety, speed and its amazing community.

This comes really handy when you have well optimized hand-crafted asm kernels you’d like to use as-they-are:

  • They are small and with a clear interface, usually strict boundaries on what they read/write by their own nature.
  • You’d basically rewrite them as they are using some inline assembly for dubious gains.
  • Both cc-rs and nasm-rs make the process of building and linking relatively painless.

Also, if you plan to integrate in a foreign language project some rust component, it is quite straightforward to link the staticlib produced by cargo in your main project.

If you have a pure-rust crate and you want to export it to the world as if it were a normal C (shared/dynamic) library, it gets quite gory.

Well behaved C-API Library structure

Usually when you want to use a C-library in your own project you should expect it to provide the following:

  • A header file, telling the compiler which symbols it should expect
  • A static library
  • A dynamic library
  • A pkg-config file giving you direction on where to find the header and what you need to pass to the linker to correctly link the library, being it static or dynamic

Header file

In C you usually keep a list of function prototypes and type definitions in a separate file and then embed it in your source file to let the compiler know what to expect.

Since you rely on a quite simple preprocessor to do that you have to be careful about adding guards so the file does not get included more than once and, in order to avoid clashes you install it in a subdirectory of your include dir.

Since the location of the header could be not part of the default search path, you store this information in pkg-config usually.

Static Libraries

Static libraries are quite simple in concept (and execution):

  • they are an archive of object code files.
  • the linker simply reads them as it would read just produced .os and link everything together.

There is a pitfall though:

  • In some platforms even if you want to make a fully static binary you end up dynamically linking some system library for a number of reasons.

    The worst offenders are the pthread libraries and in some cases the compiler builtins (e.g. libgcc_s)

  • The information on what they are is usually not known

rustc comes to the rescue with --print native-static-libs, it isn’t the best example of integration since it’s a string produced on stderr and it behaves as a side-effect of the actual building, but it is still a good step in the right direction.

pkg-config is the de-facto standard way to preserve the information and have the build systems know about it (I guess you are seeing a pattern now).

Dynamic Libraries

A shared or dynamic library is a specially crafted lump of executable code that gets linked to the binary as it is being executed.
The advantages compared to statically linking everything are mainly two:

  • Sparing disk space: since without link-time pruning you end up carrying multiple copies of the same library with every binary using it.
  • Safer and simpler updates: If you need to update say, openssl, you do that once compared to updating the 100+ consumers of it existing in your system.

There is some inherent complexity and constraints in order to get this feature right, the most problematic one is ABI stability:

  • The dynamic linker needs to find the symbols the binary expects and have them with the correct size
  • If you change the in-memory layout of a struct or how the function names are represented you should make so the linker is aware.

Usually that means that depending on your platform you have some versioning information you should provide when you are preparing your library. This can be as simple as telling the compile-time linker to embed the version information (e.g. Mach-O dylib or ELF) in the library or as complex as crafting a version script.

Compared to crafting a staticlib it there are more moving parts and platform-specific knowledge.

Sadly in this case rustc does not provide any help for now: even if the C-ABI is stable and set in stone, the rust mangling strategy is not finalized yet, and it is a large part of being ABI stable, so the work on fully supporting dynamic libraries is yet to be completed.

Dynamic libraries in most platforms have a mean to store which other dynamic libraries they reliy on and which are the paths in which to look for. When the information is incomplete, or you are storing the library in a non-standard path, pkg-config comes to the rescue again, helpfully storing the information for you.

Pkg-config

It is your single point of truth as long your build system supports it and the libraries you want to use craft it properly.
It simplifies a lot your life if you want to keep around multiple versions of a library or you are doing non-system packaging (e.g.: Homebrew or Gentoo Prefix).
Beside the search path, link line and dependency information I mentioned above, it also stores the library version and inter-library compatibility relationships.
If you are publishing a C-library and you aren’t providing a .pc file, please consider doing it.

Producing a C-compatible library out of a crate

I explained what we are expected to produce, now let see what we can do on the rust side:

  • We need to export C-ABI-compatible symbols, that means we have to:
  • Decorate the data types we want to export with #[repr(C)]
  • Decorate the functions with #[no_mangle] and prefix them with export "C"
  • Tell rustc the crate type is both staticlib and cdylib
  • Pass rustc the platform-correct link line so the library produced has the right information inside.
    > NOTE: In some platforms beside the version information also the install path must be encoded in the library.
  • Generate the header file so that the C compiler knows about them.
  • Produce a pkg-config file with the correct information

    NOTE: It requires knowing where the whole lot will be eventually installed.

cargo does not support installing libraries at all (since for now rust dynamic libraries should not be used at all) so we are a bit on our own.

For rav1e I did that the hard way and then I came up an easy way for you to use (and that I used for doing the same again with lewton spending about 1/2 day instead of several ~~weeks~~months).

The hard way

As seen in crav1e, you can explore the history there.

It isn’t the fully hard way since before cargo-c there was already nice tools to avoid some time consuming tasks: cbindgen.
In a terse summary what I had to do was:

  • Come up with an external build system since cargo itself cannot install anything nor have direct knowledge of the install path information. I used Make since it is simple and sufficiently widespread, anything richer would probably get in the way and be more time consuming to set up.
  • Figure out how to extract the information provided in Cargo.toml so I have it at Makefile level. I gave up and duplicated it since parsing toml or json is pointlessly complicated for a prototype.
  • Write down the platform-specific logic on how to build (and install) the libraries. It ended up living in the build.rs and the Makefile. Thanks again to Derek for taking care of the Windows-specific details.
  • Use cbindgen to generate the C header (And in the process smooth some of its rough edges
  • Since we already have a build system add more targets for testing and continuous integration purpose.

If you do not want to use cargo-c I spun away the cdylib-link line logic in a stand alone crate so you can use it in your build.rs.

The easier way

Using a Makefile and a separate crate with a customized build.rs works fine and keeps the developers that care just about writing in rust fully shielded from the gory details and contraptions presented above.

But it comes with some additional churn:

  • Keeping the API in sync
  • Duplicate the release work
  • Have the users confused on where to report the issues or where to find the actual sources. (The users tend to miss the information presented in the obvious places such as the README way too often)

So to try to minimize it I came up with a cargo applet that provides two subcommands:

  • cbuild to build the libraries, the .pc file and header.
  • cinstall to install the whole lot, if already built or to build and then install it.

They are two subcommands since it is quite common to build as user and then install as root. If you are using rustup and root does not have cargo you can get away with using --destdir and then sudo install or craft your local package if your distribution provides a mean to do that.

All I mentioned in the hard way happens under the hood and, beside bugs in the current implementation, you should be completely oblivious of the details.

Using cargo-c

As seen in lewton and rav1e.

  • Create a capi.rs with the C-API you want to expose and use #[cfg(cargo_c)] to hide it when you build a normal rust library.
  • Make sure you have a lib target and if you are using a workspace the first member is the crate you want to export, that means that you might have to add a "." member at the start of the list.
  • Remember to add a cbindgen.toml and fill it with at least the include guard and probably you want to set the language to C (it defaults to C++)
  • Once you are happy with the result update your documentation to tell the user to install cargo-c and do cargo cinstall --prefix=/usr --destdir=/tmp/some-place or something along those lines.

Coming next

cargo-c is a young project and far from being complete even if it is functional for my needs.

Help in improving it is welcome, there are plenty of rough edges and bugs to find and squash.

Thanks

Thanks to est31 and sdroege for the in-depth review in #rust-av and kodabb for the last minute edits.

Using Wireguard

wireguard is a modern, secure and fast vpn tunnel that is extremely simple to setup and works already nearly everywhere.

Since I spent a little bet to play with it because this looked quite interesting, I thought of writing a small tutorial.

I normally use Gentoo (and macos) so this guide is about Gentoo.

General concepts

Wireguard sets up peers identified by an public key and manages a virtual network interface and the routing across them (optionally).

The server is just a peer that knows about loots of peers while a client knows how to directly reach the server and that’s it.

Setting up in Gentoo

Wireguard on Linux is implemented as a kernel module.

So in general you have to build the module and the userspace tools (wg).
If you want to have some advanced feature make sure that your kernel has the following settings:

IP_ADVANCED_ROUTER
IP_MULTIPLE_TABLES
NETFILTER_XT_MARK

After that using emerge will get you all you need:

$ emerge wireguard

Tools

The default distribution of tools come with the wg command and an helper script called wg-quick that makes easier to bring up and down the virtual network interface.

wg help
Usage: wg <cmd> [<args>]

Available subcommands:
  show: Shows the current configuration and device information
  showconf: Shows the current configuration of a given WireGuard interface, for use with `setconf'
  set: Change the current configuration, add peers, remove peers, or change peers
  setconf: Applies a configuration file to a WireGuard interface
  addconf: Appends a configuration file to a WireGuard interface
  genkey: Generates a new private key and writes it to stdout
  genpsk: Generates a new preshared key and writes it to stdout
  pubkey: Reads a private key from stdin and writes a public key to stdout
You may pass `--help' to any of these subcommands to view usage.
Usage: wg-quick [ up | down | save | strip ] [ CONFIG_FILE | INTERFACE ]

  CONFIG_FILE is a configuration file, whose filename is the interface name
  followed by `.conf'. Otherwise, INTERFACE is an interface name, with
  configuration found at /etc/wireguard/INTERFACE.conf. It is to be readable
  by wg(8)'s `setconf' sub-command, with the exception of the following additions
  to the [Interface] section, which are handled by wg-quick:

  - Address: may be specified one or more times and contains one or more
    IP addresses (with an optional CIDR mask) to be set for the interface.
  - DNS: an optional DNS server to use while the device is up.
  - MTU: an optional MTU for the interface; if unspecified, auto-calculated.
  - Table: an optional routing table to which routes will be added; if
    unspecified or `auto', the default table is used. If `off', no routes
    are added.
  - PreUp, PostUp, PreDown, PostDown: script snippets which will be executed
    by bash(1) at the corresponding phases of the link, most commonly used
    to configure DNS. The string `%i' is expanded to INTERFACE.
  - SaveConfig: if set to `true', the configuration is saved from the current
    state of the interface upon shutdown.

See wg-quick(8) for more info and examples.

Creating a configuration

Wireguard is quite straightforward, you can either prepare a configuration with your favourite text editor or generate one by setting by hand the virtual network device and then saving the result wg showconf presents.

A configuration file then can be augmented with wg-quick-specific options (such as Address) or just passed to wg setconf while the other networking details are managed by your usual tools (e.g. ip).

Create your keys

The first step is to create the public-private key pair that identifies your peer.

  • wg genkey generates a private key for you.
  • You feed it to wg pubkey to have your public key.

In a single line:

$ wg genkey | tee privkey | wg pubkey > pubkey

Prepare a configuration file

Both wg-quick and wg setconf use an ini-like configuration file.

If you put it in /etc/wireguard/${ifname}.conf then wg-quick would just need the interface name and would look it up for you.

The minimum configuration needs an [Interface] and a [Peer] set.
You may add additional peers later.
A server would specify its ListenPort and identify the peers by their PublicKey.

[Interface]
Address = 192.168.2.1/24
ListenPort = 51820
PrivateKey = <key>

[Peer]
PublicKey = <key>
AllowedIPs = 192.168.2.2/32

A client would have a peer with an EndPoint defined and optionally not specify the ListenPort in its interface description.

[Interface]
PrivateKey = <key>
Address = 192.168.2.2/24

[Peer]
PublicKey = <key>
AllowedIPs = 192.168.2.0/24
Endpoint = <ip>:<port>

The AllowedIPs mask let you specify how much you want to route over the vpn.
By setting 0.0.0.0/0 you tell you want to route ALL the traffic through it.

NOTE: Address is a wg-quick-specific option.

Using a configuration

wg-quick is really simple to use, assuming you have created /etc/wireguard/wg0.conf:

$ wg-quick up wg0
$ wg-quick down wg0

If you are using netifrc from version 0.6.1 wireguard is supported and you can have a configuration such as:

config_wg0="192.168.2.4/24"
wireguard_wg0="/etc/wireguard/wg0.conf"

With the wg0.conf file like the above but stripped of the wg-quick-specific options.

Summing up

Wireguard is a breeze to set up compared to nearly all the other vpn solutions.

Non-linux systems can currently use a go implementation and in the future a rust implementation (help welcome).

Android and macos have already some pretty front-ends that make the setup easy even on those platforms.

I hope you enjoyed it πŸ™‚

Using rav1e – from your code

AV1, Rav1e, Crav1e, an intro

(this article is also available on my dev.to profile, I might use it more often since wordpress is pretty horrible at managing markdown.)

AV1 is a modern video codec brought to you by an alliance of many different bigger and smaller players in the multimedia field.
I’m part of the VideoLan organization and I spent quite a bit of time on this codec lately.


rav1e: The safest and fastest AV1 encoder, built by many volunteers and Mozilla/Xiph developers.
It is written in rust and strives to provide good speed, quality and stay maintainable.


crav1e: A companion crate, written by yours truly, that provides a C-API, so the encoder can be used by C libraries and programs.


This article will just give a quick overview of the API available right now and it is mainly to help people start using it and hopefully report issues and problem.

Rav1e API

The current API is built around the following 4 structs and 1 enum:

  • struct Frame: The raw pixel data
  • struct Packet: The encoded bitstream
  • struct Config: The encoder configuration
  • struct Context: The encoder state

  • enum EncoderStatus: Fatal and non-fatal condition returned by the Contextmethods.

Config

The Config struct currently is simply constructed.

    struct Config {
        enc: EncoderConfig,
        threads: usize,
    }

The EncoderConfig stores all the settings that have an impact to the actual bitstream while settings such as threads are kept outside.

    let mut enc = EncoderConfig::with_speed_preset(speed);
    enc.width = w;
    enc.height = h;
    enc.bit_depth = 8;
    let cfg = Config { enc, threads: 0 };

NOTE: Some of the fields above may be shuffled around until the API is marked as stable.

Methods

new_context
    let cfg = Config { enc, threads: 0 };
    let ctx: Context<u8> = cfg.new_context();

It produces a new encoding context. Where bit_depth is 8, it is possible to use an optimized u8 codepath, otherwise u16 must be used.

Context

It is produced by Config::new_context, its implementation details are hidden.

Methods

The Context can be grouped into essential, optional and convenience.

    // Essential API
    pub fn send_frame<F>(&mut self, frame: F) -> Result<(), EncoderStatus>
      where F: Into<Option<Arc<Frame<T>>>>, T: Pixel;
    pub fn receive_packet(&mut self) -> Result<Packet<T>, EncoderStatus>;

The encoder works by processing each Frame fed through send_frame and producing each Packet that can be retrieved by receive_packet.

    // Optional API
    pub fn container_sequence_header(&mut self) -> Vec<u8>;
    pub fn get_first_pass_data(&self) -> &FirstPassData;

Depending on the container format, the AV1 Sequence Header could be stored in the extradata. container_sequence_header produces the data pre-formatted to be simply stored in mkv or mp4.

rav1e supports multi-pass encoding and the encoding data from the first pass can be retrieved by calling get_first_pass_data.

    // Convenience shortcuts
    pub fn new_frame(&self) -> Arc<Frame<T>>;
    pub fn set_limit(&mut self, limit: u64);
    pub fn flush(&mut self) {
  • new_frame() produces a frame according to the dimension and pixel format information in the Context.
  • flush() is functionally equivalent to call send_frame(None).
  • set_limit()is functionally equivalent to call flush()once limit frames are sent to the encoder.

Workflow

The workflow is the following:

  1. Setup:
    • Prepare a Config
    • Call new_context from the Config to produce a Context
  2. Encode loop:
    • Pull each Packet using receive_packet.
    • If receive_packet returns EncoderStatus::NeedMoreData
      • Feed each Frame to the Context using send_frame
  3. Complete the encoding
    • Issue a flush() to encode each pending Frame in a final Packet.
    • Call receive_packet until EncoderStatus::LimitReached is returned.

Crav1e API

The crav1e API provides the same structures and features beside few key differences:

  • The Frame, Config, and Context structs are opaque.
typedef struct RaConfig RaConfig;
typedef struct RaContext RaContext;
typedef struct RaFrame RaFrame;
  • The Packet struct exposed is much simpler than the rav1e original.
typedef struct {
    const uint8_t *data;
    size_t len;
    uint64_t number;
    RaFrameType frame_type;
} RaPacket;
  • The EncoderStatus includes a Success condition.
typedef enum {
    RA_ENCODER_STATUS_SUCCESS = 0,
    RA_ENCODER_STATUS_NEED_MORE_DATA,
    RA_ENCODER_STATUS_ENOUGH_DATA,
    RA_ENCODER_STATUS_LIMIT_REACHED,
    RA_ENCODER_STATUS_FAILURE = -1,
} RaEncoderStatus;

RaConfig

Since the configuration is opaque there are a number of functions to assemble it:

  • rav1e_config_default allocates a default configuration.
  • rav1e_config_parse and rav1e_config_parse_int set a specific value for a specific field selected by a key string.
  • rav1e_config_set_${field} are specialized setters for complex information such as the color description.
RaConfig *rav1e_config_default(void);

/**
 * Set a configuration parameter using its key and value as string.
 * Available keys and values
 * - "quantizer": 0-255, default 100
 * - "speed": 0-10, default 3
 * - "tune": "psnr"-"psychovisual", default "psnr"
 * Return a negative value on error or 0.
 */
int rav1e_config_parse(RaConfig *cfg, const char *key, const char *value);

/**
 * Set a configuration parameter using its key and value as integer.
 * Available keys and values are the same as rav1e_config_parse()
 * Return a negative value on error or 0.
 */
int rav1e_config_parse_int(RaConfig *cfg, const char *key, int value);

/**
 * Set color properties of the stream.
 * Supported values are defined by the enum types
 * RaMatrixCoefficients, RaColorPrimaries, and RaTransferCharacteristics
 * respectively.
 * Return a negative value on error or 0.
 */
int rav1e_config_set_color_description(RaConfig *cfg,
                                       RaMatrixCoefficients matrix,
                                       RaColorPrimaries primaries,
                                       RaTransferCharacteristics transfer);

/**
 * Set the content light level information for HDR10 streams.
 * Return a negative value on error or 0.
 */
int rav1e_config_set_content_light(RaConfig *cfg,
                                   uint16_t max_content_light_level,
                                   uint16_t max_frame_average_light_level);

/**
 * Set the mastering display information for HDR10 streams.
 * primaries and white_point arguments are RaPoint, containing 0.16 fixed point values.
 * max_luminance is a 24.8 fixed point value.
 * min_luminance is a 18.14 fixed point value.
 * Returns a negative value on error or 0.
 */
int rav1e_config_set_mastering_display(RaConfig *cfg,
                                       RaPoint primaries[3],
                                       RaPoint white_point,
                                       uint32_t max_luminance,
                                       uint32_t min_luminance);

void rav1e_config_unref(RaConfig *cfg);

The bare minimum setup code is the following:

    int ret = -1;
    RaConfig *rac = rav1e_config_default();
    if (!rac) {
        printf("Unable to initialize\n");
        goto clean;
    }

    ret = rav1e_config_parse_int(rac, "width", 64);
    if (ret < 0) {
        printf("Unable to configure width\n");
        goto clean;
    }

    ret = rav1e_config_parse_int(rac, "height", 96);
    if (ret < 0) {
        printf("Unable to configure height\n");
        goto clean;
    }

    ret = rav1e_config_parse_int(rac, "speed", 9);
    if (ret < 0) {
        printf("Unable to configure speed\n");
        goto clean;
    }

RaContext

As per the rav1e API, the context structure is produced from a configuration and the same send-receive model is used.
The convenience methods aren’t exposed and the frame allocation function is actually essential.

// Essential API
RaContext *rav1e_context_new(const RaConfig *cfg);
void rav1e_context_unref(RaContext *ctx);

RaEncoderStatus rav1e_send_frame(RaContext *ctx, const RaFrame *frame);
RaEncoderStatus rav1e_receive_packet(RaContext *ctx, RaPacket **pkt);
// Optional API
uint8_t *rav1e_container_sequence_header(RaContext *ctx, size_t *buf_size);
void rav1e_container_sequence_header_unref(uint8_t *sequence);

RaFrame

Since the frame structure is opaque in C, we have the following functions to create, fill and dispose of the frames.

RaFrame *rav1e_frame_new(const RaContext *ctx);
void rav1e_frame_unref(RaFrame *frame);

/**
 * Fill a frame plane
 * Currently the frame contains 3 planes, the first is luminance followed by
 * chrominance.
 * The data is copied and this function has to be called for each plane.
 * frame: A frame provided by rav1e_frame_new()
 * plane: The index of the plane starting from 0
 * data: The data to be copied
 * data_len: Lenght of the buffer
 * stride: Plane line in bytes, including padding
 * bytewidth: Number of bytes per component, either 1 or 2
 */
void rav1e_frame_fill_plane(RaFrame *frame,
                            int plane,
                            const uint8_t *data,
                            size_t data_len,
                            ptrdiff_t stride,
                            int bytewidth);

RaEncoderStatus

The encoder status enum is returned by the rav1e_send_frame and rav1e_receive_packet and it is possible already to arbitrarily query the context for its status.

RaEncoderStatus rav1e_last_status(const RaContext *ctx);

To simulate the rust Debug functionality a to_str function is provided.

char *rav1e_status_to_str(RaEncoderStatus status);

Workflow

The C API workflow is similar to the Rust one, albeit a little more verbose due to the error checking.

    RaContext *rax = rav1e_context_new(rac);
    if (!rax) {
        printf("Unable to allocate a new context\n");
        goto clean;
    }
    RaFrame *f = rav1e_frame_new(rax);
    if (!f) {
        printf("Unable to allocate a new frame\n");
        goto clean;
    }
while (keep_going(i)){
     RaPacket *p;
     ret = rav1e_receive_packet(rax, &p);
     if (ret < 0) {
         printf("Unable to receive packet %d\n", i);
         goto clean;
     } else if (ret == RA_ENCODER_STATUS_SUCCESS) {
         printf("Packet %"PRIu64"\n", p->number);
         do_something_with(p);
         rav1e_packet_unref(p);
         i++;
     } else if (ret == RA_ENCODER_STATUS_NEED_MORE_DATA) {
         RaFrame *f = get_frame_by_some_mean(rax);
         ret = rav1e_send_frame(rax, f);
         if (ret < 0) {
            printf("Unable to send frame %d\n", i);
            goto clean;
        } else if (ret > 0) {
        // Cannot happen in normal conditions
            printf("Unable to append frame %d to the internal queue\n", i);
            abort();
        }
     } else if (ret == RA_ENCODER_STATUS_LIMIT_REACHED) {
         printf("Limit reached\n");
         break;
     }
}

In closing

This article was mainly a good excuse to try dev.to and see write down some notes and clarify my ideas on what had been done API-wise so far and what I should change and improve.

If you managed to read till here, your feedback is really welcome, please feel free to comment, try the software and open issues for crav1e and rav1e.

Coming next

  • Working crav1e got me to see what’s good and what is lacking in the c-interoperability story of rust, now that this landed I can start crafting and publishing better tools for it and maybe I’ll talk more about it here.
  • Soon rav1e will get more threading-oriented features, some benchmarking experiments will happen soon.

Thanks

  • Special thanks to Derek and Vittorio spent lots of time integrating crav1e in larger software and gave precious feedback in what was missing and broken in the initial iterations.
  • Thanks to David for the review and editorial work.
  • Also thanks to Matteo for introducing me to dev.to.

rav1e and crav1e – A fast and safe AV1 encoder – Some HowTo

Over the year I contributed to an AV1 encoder written in rust.

Here a small tutorial about what is available right now, there is still lots to do, but I think we could enjoy more user-feedback (and possibly also some help).

Setting up

Install the rust toolchain

If you do not have rust installed, it is quite simple to get a full environment using rustup

$ curl https://sh.rustup.rs -sSf | sh
# Answer the questions asked and make sure you source the `.profile` file created.
$ source ~/.profile

Install cmake, perl and nasm

rav1e uses libaom for testing and and on x86/x86_64 some components have SIMD variants written directly using nasm.

You may follow the instructions, or just install:
nasm (version 2.13 or better)
perl (any recent perl5)
cmake (any recent version)

Once you have those dependencies in you are set.

Building rav1e

We use cargo, so the process is straightforward:

## Pull in the customized libaom if you want to run all the tests
$ git submodule update --init

## Build everything
$ cargo build --release

## Test to make sure everything works as intended
$ cargo test --features decode_test --release

## Install rav1e
$ cargo install

Using rav1e

Right now rav1e has a quite simple interface:

rav1e 0.1.0
AV1 video encoder

USAGE:
    rav1e [OPTIONS]  --output 

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -I, --keyint     Keyframe interval [default: 30]
    -l, --limit                  Maximum number of frames to encode [default: 0]
        --low_latency      low latency mode. true or false [default: true]
    -o, --output                Compressed AV1 in IVF video output
        --quantizer                 Quantizer (0-255) [default: 100]
    -r 
    -s, --speed                  Speed level (0(slow)-10(fast)) [default: 3]
        --tune                    Quality tuning (Will enforce partition sizes >= 8x8) [default: psnr]  [possible
                                        values: Psnr, Psychovisual]

ARGS:
        Uncompressed YUV4MPEG2 video input

It accepts y4m raw source and produces ivf files.

You can configure the encoder by setting the speed and quantizer levels.

The low_latency flag can be turned off to run some additional analysis over a set of frames and have additional quality gains.

Crav1e

While ave and gst-rs will use the rav1e crate directly, there are a number of software such as handbrake or vlc that would be much happier to consume a C API.

Thanks to the staticlib target and cbindgen is quite easy to produce a C-ABI library and its matching header.

Setup

crav1e is built using cargo, so nothing special is needed right now beside nasm if you are building it on x86/x86_64.

Build the library

This step is completely straightforward, you can build it as release:

$ cargo build --release

or as debug

$ cargo build

It will produce a target/release/librav1e.a or a target/debug/librav1e.a.
The C header will be in include/rav1e.h.

Try the example code

I provided a quite minimal sample case.

cc -Wall c-examples/simple_encoding.c -Ltarget/release/ -lrav1e -Iinclude/ -o c-examples/simple_encoding
./c-examples/simple_encoding

If it builds and runs correctly you are set.

Manually copy the .a and the .h

Currently cargo install does not work for our purposes, but it will change in the future.

$ cp target/release/librav1e.a /usr/local/lib
$ cp include/rav1e.h /usr/local/include/

Missing pieces

Right now crav1e works well enough but there are few shortcomings I’m trying to address.

Shared library support

The cdylib target does exist and produce a nearly usable library but there are some issues with soname support. I’m trying to address them with upstream, but it might take some time.

Meanwhile some people suggest to use patchelf or similar tools to fix the library after the fact.

Install target

cargo is generally awesome, but sadly its support for installing arbitrary files to arbitrary paths is limited, luckily there are people proposing solutions.

pkg-config file generation

I consider a library not proper if a .pc file is not provided with it.

Right now there are means to extract the information need to build a pkg-config file, but there isn’t a simple way to do it.

$ cargo rustc -- --print native-static-libs

Provides what is needed for Libs.private, ideally it should be created as part of the install step since you need to know the prefix, libdir and includedir paths.

Coming next

Probably the next blog post will be about my efforts to make cargo able to produce proper cdylib or something quite different.

PS: If somebody feels to help me with matroska in AV1 would be great πŸ™‚

Contributing to libvpx

Recently I started to write the PowerPC/VSX support for libvpx, Alexandra will help as well.

Every open source project has its own rules, I found the choices taken in Libvpx interesting enough to write them down (and possibly help newcomers with some shortcuts).

Overview

Coding style

The coding style is strongly enforced, the CI system will bounce your code if it doesn’t adhere to the style.

This constraint is enforced through a clang-format ruleset.

If you are using vim, this makes your life easier, otherwise the git integration comes handy.

Otherwise:

# clang-format -i what/I/m/working/on.c

Works no matter what.

Testing

New code should have its testcase, if it isn’t already covered.

Libvpx uses gtest and it has a quite decent test coverage. A full run of the tests can take a large chunk of time, if you are working on specific code (e.g. dsp functions), is easy to run only the tests you care about like this:

# ./test_libvpx --gtest_filter="*pattern*with*globs"

The current system does not double as benchmarking tool, so you are quite on your own if you are trying to speed up some parts.

Adding brand new tests more annoying than it should since gtest is quite bloated, updating a test to cover a variant is quite painless though.

Submitting patches

Libvpx uses gerrit and jenkins in a setup that makes them almost painless and has a detailed guide on how to register and fill in some forms to make the Google lawyers happy.

Gerrit and Jenkins defaults are quite clunky, so they Libvpx maintainer definitely invested some time to get them in a better shape.

Once you registered and set the hook to tag your commits sending a set boils down to:

# git push https://chromium-review.googlesource.com/webm/libvpx HEAD:refs/for/master

Interaction

Comments and reports end up in your mailbox, spamming it a lot (expect about 5-6 emails per commit). You have to use the web interface to have decent interaction and luckily PolyGerrit isn’t terrible at all (just make sure your replies gets sent since it has the tendency of keeping them in draft mode).

TL;DR

  • read this
  • install clang-format, including the git integration
  • be ready to make changes in test/*.cc and cope with gtest verboseness.
  • be ready to receive tons of email and use your web browser

broken-endian

You wrote your code, you wrote the tests and everything seems working.

Then you got somebody running your code on a big-endian machine and reports that EVERYTHING is broken.

Usually most of the data is serialized to disk or wire as big-endian, most of cpu usually do the computation in little-endian (with MIPS and PowerPC as rare exception). If you assume the relationship between the data on-wire and data in the cpu registers is always the same you are bound to have problems (and it gets even worse if you decide to write the data down as little-endian to disk because swapping from cpu to disk feels slow, you are doing it wrong).

Checklist

The problem is mainly while reading or writing:

  • Sometimes feels simpler to copy over some packed structure using the equivalent of read(fd, &amp;my_struct, sizeof(struct)). if the struct contains anything different from byte-sized variables it won’t work, so is safe to say it won’t work at all. Gets even worse if you forgot to mark the structure as packed.
  • Writing has the same issue, never try to directly write a structure or even 16bit integers w/out making sure you get the expected endianess right.

Mini-post written to recall what not to do (more examples later).

Rethinking AVFormat – part 1

Container formats should be just a boring application of serialization of multiple arrays of tuples timestamp-binary blob.

Instead there are tons of implementation details and there are fun
and exceedingly annoying means to lose your sanity.

This post is yet another post about APIs you can see other here and here.

Current Status

In Libav we have libavformat taking care of general I/O, Muxing, Demuxing.

This blog post will not cover the additional grouping given by Programs, Chapters and such to not make the whole article huge and just focus on the basics.

I/O

The AVIO abstraction provides a mean to uniformly access content stored in files, available as remote streams (e.g. served through http or rtmp) or through custom implementations.

This part of the API is rightly coupled with the Muxer and Demuxer implementation.

It uses the common Context pattern you can find across the rest of Libav with some of twists:

  • The protocol handler can be guessed using the url provided, e.g. file:///tmp/foo.
  • The functions that allocate a context take an extra parameter than the usual options AVDictionary in the form of a callback function.
  • You can create your own custom protocol easily.
int avio_open2(AVIOContext **s, const char *url, int flags, const AVIOInterruptCB *int_cb, AVDictionary **options)

AVIOContext *avio_alloc_context(unsigned char *buffer, int buffer_size, int write_flag, void *opaque,
                                int(*read_packet)(void *opaque, uint8_t *buf, int buf_size),
                                int(*write_packet)(void *opaque, uint8_t *buf, int buf_size),
                                int64_t(*seek)(void *opaque, int64_t offset, int whence))
int avio_closep(AVIOContext **s);

The api tries to mimic the C stdio plus lots of API sugar.

# core functions
int avio_read(AVIOContext *s, unsigned char *buf, int size);
void avio_write(AVIOContext *s, const unsigned char *buf, int size);
int64_t avio_seek(AVIOContext *s, int64_t offset, int whence);


# simple integer readers
int          avio_r8  (AVIOContext *s);
uint64_t     avio_rb64(AVIOContext *s);
uint64_t     avio_rl64(AVIOContext *s);
unsigned int avio_rb16(AVIOContext *s);
unsigned int avio_rb24(AVIOContext *s);
unsigned int avio_rb32(AVIOContext *s);
unsigned int avio_rl16(AVIOContext *s);
unsigned int avio_rl24(AVIOContext *s);
unsigned int avio_rl32(AVIOContext *s);

# simple integer writers
void avio_w8(AVIOContext *s, int b);
void avio_wb16(AVIOContext *s, unsigned int val);
void avio_wb24(AVIOContext *s, unsigned int val);
void avio_wb32(AVIOContext *s, unsigned int val);
void avio_wb64(AVIOContext *s, uint64_t val);
void avio_wl16(AVIOContext *s, unsigned int val);
void avio_wl24(AVIOContext *s, unsigned int val);
void avio_wl32(AVIOContext *s, unsigned int val);
void avio_wl64(AVIOContext *s, uint64_t val);


# utf8 and utf16 strings
int avio_get_str(AVIOContext *pb, int maxlen, char *buf, int buflen);

int avio_get_str16le(AVIOContext *pb, int maxlen, char *buf, int buflen);
int avio_get_str16be(AVIOContext *pb, int maxlen, char *buf, int buflen);

int avio_put_str(AVIOContext *s, const char *str);

int avio_put_str16le(AVIOContext *s, const char *str);

... (and more) ...

Buffering

All the function use an intermediate buffer to back reads and writes, the buffer can be explicitly flushed or it gets flushed automatically once the request would end outside it.

void avio_flush(AVIOContext *s);

A special kind of AVIOContext is a dynamic write buffer, it extends on demand and can be used to build complex recourring patterns once and write them as many time as needed.

int avio_open_dyn_buf(AVIOContext **s);

int avio_close_dyn_buf(AVIOContext *s, uint8_t **pbuffer);

Error handling

An I/O layer has to take in account the fact the resource being read or written could be abruptly disappear or suddenly slow down. This is valid for both local and remote resources.

The internal buffer allocation might fail.

A seek too far could lead to the end of file.

AVIO approach to errors is quite simplicistic:
– A write can silently fail.
– A failing read just returns 0-ed buffer or value.
– All the functions set the error field or the eof_reached field.

Is up to the user to decide when to check for I/O problems or leverage the AVIOInterruptCB to implement timeouts or other mean to interrupt a read or a write that otherwise would just quietly block till it is completed.

Demuxing (and Probing)

The AVFormat part taking care of input streams can be split in three: Probing the data to guess the right demuxer, the actual Demuxing and optionally parse the demuxed data and fit it in packets containing the information needed by the decoder to decode a frame of video or a matching amount of audio samples, later I call it frame-worth amount of data and I call this process chopping amorphous data streams. It is colorful as expression but represents quite well the endeavor.

Probing

The Probe functions take an arbitrary big chunk of data (stored in a AVProbeData struct) and figure out which demuxer should be able to actually parse it correctly.

As a rule of thumb probes need to be fast since all of them have to be run over the data at least once and possibly multiple times since if the result is not really conclusive increasing the data and trying again is an option.

AVInputFormat *av_probe_input_format2(AVProbeData *pd, int is_opened,
                                      int *score_max);

An helper function to probe from an AVIOContext and get the possible input format is provided.

int av_probe_input_buffer(AVIOContext *pb, AVInputFormat **fmt,
                          const char *filename, void *logctx,
                          unsigned int offset, unsigned int max_probe_size);

It used internally by avformat_open_input to automatically figure out the demuxer to use and it might look a little confusing.

Demuxing

Once that the input format is either guessed or selected the actual muxing conceptually is just providing AVPackets
as they are parsed. You might want to reposition within the stream at random times (the infamous seeking opening yet another can of worms).

int avformat_open_input(AVFormatContext **ps, const char *filename,
                        AVInputFormat *fmt, AVDictionary **options);

int av_read_frame(AVFormatContext *s, AVPacket *pkt);

void avformat_close_input(AVFormatContext **ps);
Figuring out the data inside the format

Some container formats keep the information regarding their contents in a global header at the start of the file, other, that could have new data streams appearing at random times, do not.

Since there is no easy mean to figure out which kind of data they are storing, the only safe way to figure out is to try to decode some packets in order to know which kind of data is available, avformat_find_stream_info.

int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options);

The apparently simple function does a lot of work behind the scenes: it demuxes and decodes a settable number of packets before giving up and keeps all of them in an internal queue so that they will be available for demuxing even if the input stream is not seekable.

Getting the data outside

Containers such as MPEG PS mux data in small fixed-sized chunks
while usually muxers and decoders expect to receive AVPackets containing enough data to produce a frame.

Specific parsers can be inserted automatically to take amorphous stream of demuxed data and chop out of it AVPackets containing the expected amount of data.

This happens usually automatically so the user does not have to care about it as long as the codec parser is present.

Timestamps

The multimedia data is expected to carry a timestamp to present at the same time video frames and audio frames (and subtitles).

Some containers do provide directly such timestamps, other do not, requiring some amount of guesswork by some heuristics that might or might not work depending on the codec at hand.

For example, if the container is supposed to not allow variable frame rate, the implicit time stamp for video can be deduced from the frame number. This might not work as expected if the codec uses B-frames and requires some form
of reordering.

This part in Libav is sort of hidden and often causing a number of problems.

Seeking

Seeking is quite a different and large can of worms.

Ideally seeking just sets the AVIOContext to a certain position and the demuxer keeps working from there.

int av_seek_frame(AVFormatContext *s, int stream_index,
                  int64_t timestamp, int flags);

Depending on the container format and the codec picking the correct byte offset from the user provided timestamp can be incredibly simple or really complex, with various degrees of precision.

Some format provide an precise index so a plain lookup is enough, a dichotomic search looking for the closest I-frame is the common case and in the worst situation a linear search might be required.

In some cases auxiliary indexes are built to speed up seeking within previously parsed areas.

Seeking is not fun at the demuxer level and gets even worse at the codec level if the data provided is not the one expected.

Muxing

Muxing is sort of simpler than demuxing. The output format is always known and the data always come in AVPackets matching a frame-worth of raw data and possibly sporting correct timestamps.

API-wise it expects an AVFormatContext with the oformat set to the correct AVOutputFormat and the pb
set with an allocated AVIOContext and populated AVStreams.

Once the AVFormatContext is configured is possible to write the packets. First the global header should be written, then as many packets as needed are muxed, interleaving audio and video so that demuxing and seeking work correctly.

int avformat_write_header(AVFormatContext *s, AVDictionary **options);

int av_interleaved_write_frame(AVFormatContext *s, AVPacket *pkt);

int av_write_trailer(AVFormatContext *s);

Bitstream filtering

Some codecs have multiple possible representation, e.g. H264 has the AVCC bitstream format and the Annex B bitstream format. Come containers support both, other expect only one or the other. Currently the correct converter from a bitstream to another must be inserted manually.

Packet interleaving

Certain container formats have quite peculiar muxing rules. This is normally hidden from the user, in certain cases being able to override it is a boon.

Shortcomings summary

In the next post I will explain how I would improve the situation, today post is mainly to introduce the structure of AVFormat and start explaining what should be fixed. Here a short list of what I’d like to fix sooner than later.

Non-uniform API

  • There is quite a mixture of av_ and avformat_ namespaces.
  • The muxing and demuxing APIs are sufficiently confusing (and surely I should complete my avformat_open_output to reduce the boilerplate)

Abstractions Leaking the wrong way

  • The demuxing side automagically inserts parsers to chop data streams in a frame-worth amount of data while the muxing side would just fail if the bitstream provided is not matching the one required by the container format.
  • There is quite of hidden magic happening in avformat_find_stream_info and just recently we added options to at least flush the buffer it keeps to probe for codecs. Having a better function and a better mean to control this kind of internal buffer would be surely appreciated by the user that need to keep the latency low.
  • There is no good mean to be notified if the number of streams change (new streams found or old streams disappearing).

Bad implementations

  • The old muxers sometimes do not even use the now-available internals (e.g. the interleaver helpers) but implement internally queues and logic that should be now common and shared across all the muxers.
  • While AVCodec has (now) quite an uniform mean to slice bytes and bits, avformat is not leveraging it beside few places.

PS: Kostya prefers to provide both amorphous stream and chopped packets. It makes sense since you might have some codec you cannot parse but you can sort of safely remux if the container is the same.
For the common case I’d rather suggest to use a set of functions that always insert parsers when they can both demuxing and muxing and provide another set of functions to get arbitrary lumps of stream as provided by the container format.

Bridging Markdown to sphinx

One of my annoying itch is documentation.

I like a lot sphinx as toolchain but the underlying rst has a quite steep learning curve and it is outright ugly to write in many common situation.

I like a lot kramdown as syntax but sadly it is ruby-only and overall the Markdown implementation for python usually have a good number of shortcomings, including the quite annoying part of not having a full AST for the extensions, making quite a pain to proper translations (e.g. moin-2 markdown can’t use the extension supported by the original since they get mangled badly during the process of node matching)…

Enters CommonMark

CommonMark is a cooperative effort to build actually a proper specification of the ubiquitous markdown syntax. The implementations usually provide a full AST and the python one (derived from the javascript one) is quite easy to understand, fast enough and easy to extend.

I know that somebody else already tried to bridge docutils and markdown, sadly parsley is a tad slow for the purpose. I gutted away the original markdown parser and wired in commonmark-py the result is a decently fast implementation that maps most of the core syntax to the docutils AST and thus makes possible to write in markdown and get it converted using the docutils output formats.

What’s left

ReStructuredText is much richer than CommonMark core, at least I should complete my work to support attributes so the manpage output would work mostly as intended.

The directive system is quite different from the one currently discussed and that will cause a good deal of headache to map the sphinx extension to document the function parameters.

As usual help welcome!

PowerPC is back (and little endian)

Yesterday I fixed a PowerPC issue since ages, it is an endianess issue, and it is (funny enough) on the little endian flavour of it.

PowerPC

I have some ties with this architecture since my interest on the architecture (and Altivec/VMX in particular) is what made me start contributing to MPlayer while fixing issue on Gentoo and from there hack on the FFmpeg of the time, meet the VLC people, decide to part ways with Michael Niedermayer and with the other main contributors of FFmpeg create Libav. Quite a loong way back in the time.

Big endian, Little Endian

It is a bit surprising that IBM decided to use little endian (since big endian is MUCH nicer for I/O processing such as networking) but they might have their reasons.

PowerPC traditionally always had been both-endian with the ability to switch on the fly between the two (this made having foreign-endian simulators lightly less annoying to manage), but the main endianess had always been big.

This brings us to a quite interesting problem: Some if not most of the PowerPC code had been written thinking in big-endian. Luckily since most of the code wrote was using C intrinsics (Bless to whoever made the Altivec intrinsics not as terrible as the other ones around) it won’t be that hard to recycle most of the code.

More will follow.

lldb: how to botch the user interface

Recently I had to spend some time developing on MacOSX. Gentoo-Prefix sadly is getting less and less useful till we don’t make clang a first class citizen (People proposing a GSoC for it are welcome!) so I’m forced to use what’s provided by Xcode. All in all I do like a lot most of the new toolchain: clang instead of an ancient gcc-4.2, a brand new ld replacing a stale binutils. Just lldb is not good.

Clang

clang is wonderful for developing, it is arguably fast at building and the generated code isn’t that bad, beside when you are using asan and it miscompiles… (reported to the asan developers, they will have a look, gcc-asan works as expected.

The warning reporting is probably one of the feature I do miss in other compilers and that’s why I added it to cparser and I’m looking forward to move to gcc-4.9.

All in all clang developers increased the usability of the compiler and made the other projects improve as well, competition in opensource does work.

ld

The linker is again different from the usual binutils, normally you do not notice it but with the new xcode you have to face it since some projects will have problems finding symbols. Again the reporting is quite good, not stellar as clang’s but when the missing symbols are C++ it does a better job than stock binutils in telling you what’s missing from where.

lldb

The new debugger probably isn’t really ready for the prime time. gdb gets its share of complaints about some of its quirks (the macros system is quite minimal and the python interface is good, but not documented as it should), but it is really effective and fast to use.

lldb is not. Almost every command that in gdb is a single statement, and can be shortened to a single letter, in lldb it is two statements , usually with a compulsory option.

Setting breakpoints, watchers, moving through frames; everything gets more cumbersome to use.

The reporting is a little more confusing and the error messages can be misleading. And since you might use the tool while under pressure (e.g. there is a last second bug found before a main release), you want to be as quick as possible.

While debugging some VDA hwaccel improvements for libav I got to spend quite a bit of time tracking why a pointer gets nulled.

The watchpoint I set to figure out triggered at random times in the innards of the osx memory management and I couldn’t actually see when or how that happens.

I ended up writing a dummy hwaccel accessing the same fields on linux, run it through gdb and discover the actual problem in … 10minutes, code and reboots included.

I do hope we’ll see a better interface for lldb and further improvements on gdb (and hopefully combinations such as clang + gdb and gcc + lldb will work better).