Announcing Better Support for Fuzzing with Structured Inputs in Rust

Today, on behalf of the Rust Fuzzing Authority, I’d like to announce new releases of the arbitrary, libfuzzer-sys, and cargo fuzz crates. Collectively, these releases better support writing fuzz targets that take well-formed instances of custom input types. This enables us to combine powerful, coverage-guided fuzzers with smart test case generation.

Install or upgrade cargo fuzz with:

cargo install --force cargo-fuzz

To upgrade your fuzz targets, bump your libfuzzer-sys dependency to 0.2.0 on crates.io. That should be all that’s needed for most cases. However, if you were already using Arbitrary inputs for your fuzz target, some changes will be required. See the upgrading fuzz targets section below for more details.

Fuzzing with Well-Formed, Custom Inputs

Imagine we are testing an ELF object file parser. In this scenario, it makes sense to fuzz the parser by feeding it raw byte buffers as input. Parsers for binary formats take raw byte buffers as input; there’s no impedance mismatch here. Additionally, coverage-guided fuzzers like libFuzzer naturally generate byte buffers as inputs for fuzz targets, so getting fuzzing up and running is easy.

Now instead imagine we are testing a color conversion library: converting between RGB colors to HSL colors and back. Our color conversion functions don’t take raw byte buffers as inputs, they take Rgb or Hsl structures. And these structures are defined locally by our library; libFuzzer doesn’t have any knowledge of them and can’t generate instances of them on its own.

We could write a color string parser, parse colors from the fuzzer-provided, raw input bytes, and then pass the parsed results into our color conversion functions. But now the fuzzer is going to spend a bunch of time exploring and testing our color string parser, before it can get “deeper” into the fuzz target, where our color conversions are. Small changes to the input can result in invalid color strings, that bounce off the parser. Ultimately, our testing is less efficient than we want, because our real goal is to exercise the conversions but we’re doing all this other stuff. On top of that, there’s this hump we have to get over to start fuzzing, since we have to write a bunch of parsing code if we didn’t already have it.

This is where the Arbitrary trait comes in. Arbitrary lets us create structured inputs from raw byte buffers with as thin a veneer as possible. As best it can, it preserves the property that small changes to the input bytes lead to small changes in the corresponding Arbitrary instance constructed from those input bytes. This helps coverage-guided fuzzers like libFuzzer efficiently explore the input space. The new 0.3.0 release of arbitrary contains an overhaul of the trait’s design to further these goals.

Implementing Arbitrary for custom types is easy: 99% of the time, all we need to do is automatically derive it. So let’s do that for Rgb in our color conversion library:

use arbitrary::Arbitrary;

#[derive(Arbitrary, Debug, PartialEq, Eq)]
pub struct Rgb {
    r: u8,
    g: u8,
    b: u8,
}

The libfuzzer-sys crate lets us define fuzz targets where the input type is anything that implements Arbitrary:

// The fuzz target takes a well-formed `Rgb` as input!
libfuzzer_sys::fuzz_target!(|rgb: Rgb| {
    let hsl = rgb.to_hsl();
    let rgb2 = hsl.to_rgb();

    // RGB -> HSL -> RGB is the identity function. This
    // property should hold true for all RGB colors!
    assert_eq!(rgb, rgb2);
});

Now we have a fuzz target that works with our custom input type directly, is the thinnest abstraction possible over the raw, fuzzer-provided input bytes, and we didn’t need to write any custom parsing logic ourselves!

This isn’t limited to simple structures like Rgb. The arbitrary crate provides Arbitrary implementations for everything in std that you’d expect it to: bool, u32, f64, String, Vec, HashMap, PathBuf, etc… Additionally, the custom derive works with all kinds of structs and enums, as long as each sub-field implements Arbitrary.

For more details check out the new Structure-Aware Fuzzing section of The Rust Fuzzing Book and the arbitrary crate. The book has another neat example, where we’re testing a custom allocator implementation and the fuzz target takes a sequence of malloc, realloc, and free commands.

Bonus: Improved UX in `cargo fuzz`

When cargo fuzz finds a failing input, it will display the Debug formatting of the failing input (particularly nice with custom Arbitrary inputs) and suggest common next tasks, like reproducing the failure or running test case minimization.

For example, if we write a fuzz target that panics when r < g < b for a given Rgb instance, libFuzzer will very quickly find an input that triggers the failure, and then cargo fuzz will give us this friendly output:

Failing input:

    fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Output of `std::fmt::Debug`:

    Rgb {
        r: 30,
        g: 40,
        b: 110,
    }

Reproduce with:

    cargo fuzz run my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Minimize test case with:

    cargo fuzz tmin my_fuzz_target fuzz/artifacts/my_fuzz_target/crash-7bb2b62488fd8fc49937ebeed3016987d6e4a554

Upgrading Fuzz Targets

First, make sure you’ve upgraded cargo fuzz:

cargo install --force cargo-fuzz

Next, upgrade your libfuzzer-sys from a git dependency to the 0.2.0 version on crates.io in your Cargo.toml:

 [dependencies]
-libfuzzer-sys = { git = "https://github.com/rust-fuzz/libfuzzer-sys.git" }
+libfuzzer-sys = "0.2.0"

If your existing fuzz targets were not using custom Arbitrary inputs, and were taking &[u8] slices of raw bytes, then you’re done!

If you’re implementing Arbitrary for your own custom input types, you’ll need to bump your dependency on Arbitrary to version 0.3. We recommend that, unless you have any specialized logic in your Arbitrary implementation, that you use the custom derive.

Enable the custom derive by requiring the "derive" feature:

[dependencies]
arbitrary = { version = "0.3", features = ["derive"] }

And then derive Arbitrary automatically:

use arbitrary::Arbitrary;

#[derive(Arbitrary, Debug)]
struct MyStruct {
    // ...
}

Finally, if you do you specialized logic in your Arbitrary implementation, and can’t use the custom derive, your implementations will change something like this:

 use arbitrary::{Arbitrary, Unstructured};

 impl Arbitrary for MyType {
-    fn arbitrary<U>(u: &mut U) -> Result<Self, U::Error>
-    where
-        U: Unstructured + ?Sized,
-    {
+    fn arbitrary(
+        u: &mut Unstructured
+    ) -> arbitrary::Result<Self> {
         // ...
     }
 }

The trait has been simplified a little bit, and Unstructured is a concrete type now, rather than a trait you need to parameterize over. Check out the CHANGELOG for more details.

`CHANGELOG`s

Thank You! 💖

Thanks to everyone who contributed to these releases!

Alex Rebert
koushiro
Manish Goregaokar
Nick Fitzgerald
Simonas Kazlauskas

And a special shout out to Manish for fielding so many pull request reviews!

FAQ

What is fuzzing?

Fuzzing is a software testing technique used to find security, stability, and correctness issues by feeding pseudo-random data as input to the software.

Learn more about fuzzing Rust code in The Rust Fuzzing Book!

How is all this different from `quickcheck` and `proptest`?

If you’re familiar with quickcheck or proptest and their own versions of the Arbitrary trait, you might be wondering what the difference is between what’s presented here and those tools.

The primary goal of what’s been presented here is to have a super-thin, efficient layer on top of coverage-guided, mutation-based fuzzers like libFuzzer. That means the paradigm is a little different from quickcheck and proptest. For example, arbitrary::Arbitrary doesn’t take a random number generator like quickcheck::Arbitrary does. Instead it takes an Unstructured, which is a helpful wrapper around a raw byte buffer provided by the fuzzer. It’s similar to libFuzzer’s FuzzedDataProvider. The goal is to preserve, as much as possible, the actual input given to us by the fuzzer, and make sure that small changes in the raw input lead to small changes in the value constructed via arbitrary::Arbitrary. Similarly, we don’t want different, customizable test case generation strategies like proptest supports, because we leverage the fuzzer’s insight into code coverage to efficiently explore the input space. We don’t want to get in the way of that, and step on the fuzzer’s toes.

This isn’t to say that quickcheck and proptest are without value! On the contrary, if you are not using a coverage-guided fuzzer like libFuzzer or AFL — perhaps because you’re working on an unsupported platform — and are instead using a purely-random, generation-based fuzzing setup, then both quickcheck and proptest are fantastic options to consider.