Building crates so they look like C(ABI) Libraries

I presented cargo-c at the rustlab 2019, here is a longer followup of this.

Mixing Rust and C

One of the best selling point for rust is being highly interoperable with the C-ABI, in addition to safety, speed and its amazing community.

This comes really handy when you have well optimized hand-crafted asm kernels you’d like to use as-they-are:

  • They are small and with a clear interface, usually strict boundaries on what they read/write by their own nature.
  • You’d basically rewrite them as they are using some inline assembly for dubious gains.
  • Both cc-rs and nasm-rs make the process of building and linking relatively painless.

Also, if you plan to integrate in a foreign language project some rust component, it is quite straightforward to link the staticlib produced by cargo in your main project.

If you have a pure-rust crate and you want to export it to the world as if it were a normal C (shared/dynamic) library, it gets quite gory.

Well behaved C-API Library structure

Usually when you want to use a C-library in your own project you should expect it to provide the following:

  • A header file, telling the compiler which symbols it should expect
  • A static library
  • A dynamic library
  • A pkg-config file giving you direction on where to find the header and what you need to pass to the linker to correctly link the library, being it static or dynamic

Header file

In C you usually keep a list of function prototypes and type definitions in a separate file and then embed it in your source file to let the compiler know what to expect.

Since you rely on a quite simple preprocessor to do that you have to be careful about adding guards so the file does not get included more than once and, in order to avoid clashes you install it in a subdirectory of your include dir.

Since the location of the header could be not part of the default search path, you store this information in pkg-config usually.

Static Libraries

Static libraries are quite simple in concept (and execution):

  • they are an archive of object code files.
  • the linker simply reads them as it would read just produced .os and link everything together.

There is a pitfall though:

  • In some platforms even if you want to make a fully static binary you end up dynamically linking some system library for a number of reasons.

    The worst offenders are the pthread libraries and in some cases the compiler builtins (e.g. libgcc_s)

  • The information on what they are is usually not known

rustc comes to the rescue with --print native-static-libs, it isn’t the best example of integration since it’s a string produced on stderr and it behaves as a side-effect of the actual building, but it is still a good step in the right direction.

pkg-config is the de-facto standard way to preserve the information and have the build systems know about it (I guess you are seeing a pattern now).

Dynamic Libraries

A shared or dynamic library is a specially crafted lump of executable code that gets linked to the binary as it is being executed.
The advantages compared to statically linking everything are mainly two:

  • Sparing disk space: since without link-time pruning you end up carrying multiple copies of the same library with every binary using it.
  • Safer and simpler updates: If you need to update say, openssl, you do that once compared to updating the 100+ consumers of it existing in your system.

There is some inherent complexity and constraints in order to get this feature right, the most problematic one is ABI stability:

  • The dynamic linker needs to find the symbols the binary expects and have them with the correct size
  • If you change the in-memory layout of a struct or how the function names are represented you should make so the linker is aware.

Usually that means that depending on your platform you have some versioning information you should provide when you are preparing your library. This can be as simple as telling the compile-time linker to embed the version information (e.g. Mach-O dylib or ELF) in the library or as complex as crafting a version script.

Compared to crafting a staticlib it there are more moving parts and platform-specific knowledge.

Sadly in this case rustc does not provide any help for now: even if the C-ABI is stable and set in stone, the rust mangling strategy is not finalized yet, and it is a large part of being ABI stable, so the work on fully supporting dynamic libraries is yet to be completed.

Dynamic libraries in most platforms have a mean to store which other dynamic libraries they reliy on and which are the paths in which to look for. When the information is incomplete, or you are storing the library in a non-standard path, pkg-config comes to the rescue again, helpfully storing the information for you.

Pkg-config

It is your single point of truth as long your build system supports it and the libraries you want to use craft it properly.
It simplifies a lot your life if you want to keep around multiple versions of a library or you are doing non-system packaging (e.g.: Homebrew or Gentoo Prefix).
Beside the search path, link line and dependency information I mentioned above, it also stores the library version and inter-library compatibility relationships.
If you are publishing a C-library and you aren’t providing a .pc file, please consider doing it.

Producing a C-compatible library out of a crate

I explained what we are expected to produce, now let see what we can do on the rust side:

  • We need to export C-ABI-compatible symbols, that means we have to:
  • Decorate the data types we want to export with #[repr(C)]
  • Decorate the functions with #[no_mangle] and prefix them with export "C"
  • Tell rustc the crate type is both staticlib and cdylib
  • Pass rustc the platform-correct link line so the library produced has the right information inside.
    > NOTE: In some platforms beside the version information also the install path must be encoded in the library.
  • Generate the header file so that the C compiler knows about them.
  • Produce a pkg-config file with the correct information

    NOTE: It requires knowing where the whole lot will be eventually installed.

cargo does not support installing libraries at all (since for now rust dynamic libraries should not be used at all) so we are a bit on our own.

For rav1e I did that the hard way and then I came up an easy way for you to use (and that I used for doing the same again with lewton spending about 1/2 day instead of several ~~weeks~~months).

The hard way

As seen in crav1e, you can explore the history there.

It isn’t the fully hard way since before cargo-c there was already nice tools to avoid some time consuming tasks: cbindgen.
In a terse summary what I had to do was:

  • Come up with an external build system since cargo itself cannot install anything nor have direct knowledge of the install path information. I used Make since it is simple and sufficiently widespread, anything richer would probably get in the way and be more time consuming to set up.
  • Figure out how to extract the information provided in Cargo.toml so I have it at Makefile level. I gave up and duplicated it since parsing toml or json is pointlessly complicated for a prototype.
  • Write down the platform-specific logic on how to build (and install) the libraries. It ended up living in the build.rs and the Makefile. Thanks again to Derek for taking care of the Windows-specific details.
  • Use cbindgen to generate the C header (And in the process smooth some of its rough edges
  • Since we already have a build system add more targets for testing and continuous integration purpose.

If you do not want to use cargo-c I spun away the cdylib-link line logic in a stand alone crate so you can use it in your build.rs.

The easier way

Using a Makefile and a separate crate with a customized build.rs works fine and keeps the developers that care just about writing in rust fully shielded from the gory details and contraptions presented above.

But it comes with some additional churn:

  • Keeping the API in sync
  • Duplicate the release work
  • Have the users confused on where to report the issues or where to find the actual sources. (The users tend to miss the information presented in the obvious places such as the README way too often)

So to try to minimize it I came up with a cargo applet that provides two subcommands:

  • cbuild to build the libraries, the .pc file and header.
  • cinstall to install the whole lot, if already built or to build and then install it.

They are two subcommands since it is quite common to build as user and then install as root. If you are using rustup and root does not have cargo you can get away with using --destdir and then sudo install or craft your local package if your distribution provides a mean to do that.

All I mentioned in the hard way happens under the hood and, beside bugs in the current implementation, you should be completely oblivious of the details.

Using cargo-c

As seen in lewton and rav1e.

  • Create a capi.rs with the C-API you want to expose and use #[cfg(cargo_c)] to hide it when you build a normal rust library.
  • Make sure you have a lib target and if you are using a workspace the first member is the crate you want to export, that means that you might have to add a "." member at the start of the list.
  • Remember to add a cbindgen.toml and fill it with at least the include guard and probably you want to set the language to C (it defaults to C++)
  • Once you are happy with the result update your documentation to tell the user to install cargo-c and do cargo cinstall --prefix=/usr --destdir=/tmp/some-place or something along those lines.

Coming next

cargo-c is a young project and far from being complete even if it is functional for my needs.

Help in improving it is welcome, there are plenty of rough edges and bugs to find and squash.

Thanks

Thanks to est31 and sdroege for the in-depth review in #rust-av and kodabb for the last minute edits.

Leave a Reply

Your email address will not be published. Required fields are marked *