|  | # Improving your fuzzer | 
|  |  | 
|  | When you first begin fuzzing a new target, the fuzzer may crash very quickly. Fuzzing typically | 
|  | produces an initial spike of defects found followed by a long tail. The frequency of defects being | 
|  | found can drop for several reasons, including: | 
|  | * There are fewer defects in the code being tested. | 
|  | * There are sections of the code that aren't being tested by the fuzzer. | 
|  |  | 
|  | To distinguish between these, you need to be able to assess and improve your fuzzer. | 
|  |  | 
|  | ## Improve code coverage | 
|  |  | 
|  | The first step in improving a fuzzer is to understand how well it is performing currently. An | 
|  | obvious key metric for coverage-guided fuzzers is code coverage. | 
|  |  | 
|  | ### Measure code coverage | 
|  |  | 
|  | The prerequisite for improving a fuzzers code coverage is knowing the fuzzer's code coverage. You | 
|  | can collect this information using `fx fuzz`. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | <pre class="devsite-terminal"> | 
|  | fx fuzz analyze <var>package</var>/<var>fuzzer</var> | 
|  | </pre> | 
|  |  | 
|  | This will run the fuzzer for 60 seconds and report the code coverage of the corpus on the device. | 
|  | If you specify a `--staging` option, files in that directory will first be added to the corpus. | 
|  |  | 
|  | If notice gaps in the code coverage, you can add individual inputs to a seed corpus: | 
|  | 1. Add a directory to the source tree near your fuzzer. | 
|  | 1. Add one or more files to this directory, each containing the raw bytes of a test input that | 
|  | causes the fuzzer to reach previously uncovered code. | 
|  | 1. Specify this directory in the [fuzzer GN template](build-a-fuzzer.md#fuzzer) as the seed corpus. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | <pre> | 
|  | <code class="devsite-terminal">cd $FUCHSIA_DIR</code> | 
|  | <code class="devsite-terminal">mkdir <var>path-to-library</var>/my-fuzzer-corpus</code> | 
|  | <code class="devsite-terminal">cp <var>handcrafted-input</var> <var>path-to-library</var>/my-fuzzer-corpus</code> | 
|  | </pre> | 
|  |  | 
|  | And in `//path/to/library/BUILD.gn`: | 
|  |  | 
|  | ``` | 
|  | cpp_fuzzer("my_fuzzer"){ | 
|  | sources = [ "my-fuzzer.cc" ] | 
|  | deps = [ ":my-library" ] | 
|  | corpus = "my-fuzzer-corpus" | 
|  | } | 
|  | ``` | 
|  |  | 
|  | ## Make code friendlier to fuzzing | 
|  |  | 
|  | Generally, libFuzzer is fairly effective at finding inputs that explore new conditional branches | 
|  | when the decision is based on bytes of the input. For example, it can use instrumentation on | 
|  | comparison instructions, such as CMP, to determine what value is needed to match a check on some | 
|  | portion of the input. | 
|  |  | 
|  | But this approach can fail when the fuzzer encounters "fuzzer-hostile" conditions. These include: | 
|  |  | 
|  | * {C/C++} | 
|  | * Conditions that use data from external sources. For example: | 
|  |  | 
|  | ```cpp | 
|  | zx_cprng_draw(&val, sizeof(val)); | 
|  | if (val == 0) { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions checking values that are possible to construct, but hard to guess. For example: | 
|  |  | 
|  | ```cpp | 
|  | uint32_t actual = header.checksum; | 
|  | header.checksum = 0; | 
|  | uint32_t expected =  crc32(0, reinterpret_cast<const uint8_t*>(&header), sizeof(header)); | 
|  | if (actual == expected) { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions that check the results of [one-way functions][one-way-function]. | 
|  |  | 
|  | ```cpp | 
|  | int result = ECDSA_verify(0, data, data_len, signature, signature_len, ec_key); | 
|  | if (result == 0) { ... } | 
|  | ``` | 
|  |  | 
|  | * {Rust} | 
|  | * Conditions that use data from external sources. For example: | 
|  |  | 
|  | ```rust | 
|  | let mut randbuf = [0; 8]; | 
|  | zx::cprng_draw(&mut randbuf)?; | 
|  | let val = u64::from_le_bytes(randbuf); | 
|  | if val == 0 { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions checking values that are possible to construct, but hard to guess. For example: | 
|  |  | 
|  | ```rust | 
|  | let mut c = Checksum::new(); | 
|  | c.add_bytes(&buf); | 
|  | c.checksum() | 
|  | if c == expected { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions that check the results of [one-way functions][one-way-function]. | 
|  |  | 
|  | ```rust | 
|  | let digest = H::hash(message); | 
|  | if boringssl::ecdsa_verify(digest.as_ref(), self.bytes(), &key.inner.key) { ... } | 
|  | ``` | 
|  |  | 
|  | * {Go} | 
|  | * Conditions that use data from external sources. For example: | 
|  | ```golang | 
|  | if rand.Intn(100) == 0 { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions checking values that are possible to construct, but hard to guess. For example: | 
|  |  | 
|  | ```golang | 
|  | iCksum := ipv4.CalculateChecksum() | 
|  | if iCksum != want { ... } | 
|  | ``` | 
|  |  | 
|  | * Conditions that check the results of [one-way functions][one-way-function]. | 
|  |  | 
|  | ```golang | 
|  | ecdsaKey, ok := key.(*ecdsa.PublicKey) | 
|  | h := e.hash.New() | 
|  | h.Write(msg) | 
|  | if ecdsa.Verify(ecdsaKey, h.Sum(nil), ecdsaSignature.R, ecdsaSignature.S) { ... } | 
|  | ``` | 
|  |  | 
|  | As a code maintainer, you can use conditional compilation to add workarounds to these conditions in | 
|  | the code being tested. libFuzzer refers to this as using a [fuzzer-friendly build mode][friendly]. | 
|  |  | 
|  | * {C/C++} | 
|  |  | 
|  | Use the common build macro, `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION`. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```cpp | 
|  | #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION | 
|  | // Use hard-coded value when fuzzing. | 
|  | memset(&val, 0, sizeof(val)); | 
|  | #else | 
|  | zx_cprng_draw(&val, sizeof(val)); | 
|  | #endif | 
|  | ``` | 
|  |  | 
|  | In this example, we have set all the bytes of `val` to always be zero. Depending on the code, it | 
|  | may be more useful to the fuzzer if `val` is some other deterministic value, or possibly even | 
|  | directly depends on the fuzzer input. | 
|  |  | 
|  | * {Rust} | 
|  |  | 
|  | Use the `fuzz` [cfg attribute][cfg-attribute]. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```rust | 
|  | #[cfg(not(fuzz))] | 
|  | fn is_valid(&self, key: &EcPubKey<C>, message: &[u8]) -> bool { | 
|  | let digest = H::hash(message); | 
|  | boringssl::ecdsa_verify(digest.as_ref(), self.bytes(), &key.inner.key) | 
|  | } | 
|  |  | 
|  | #[cfg(fuzz)] | 
|  | fn is_valid(&self, key: &EcPubKey<C>, message: &[u8]) -> bool { | 
|  | // Skip validation when fuzzing. | 
|  | return true; | 
|  | } | 
|  | ``` | 
|  |  | 
|  | * {Go} | 
|  |  | 
|  | Use the `fuzz` package. Since Go only performs [conditional compilation][go-build] at the file | 
|  | level, this package include two files that define an `const Enabled <bool>`. Which file is | 
|  | included, and therefore the value of `Enabled` is determined by whether the code is being built in | 
|  | a fuzzer variant or not. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```golang | 
|  | import "fuzz" | 
|  |  | 
|  | func (b IPv4) CalculateChecksum() uint16 { | 
|  | if fuzz.Enabled { | 
|  | // Return hard-coded value when fuzzing. | 
|  | return uint16(0xffff) | 
|  | } | 
|  | return Checksum(b[:b.HeaderLength()], 0) | 
|  | } | 
|  | ``` | 
|  |  | 
|  | ### Add custom mutators | 
|  |  | 
|  | Note: Custom mutators are currently only supported in C/C++. You may be able to use them in other | 
|  | languages using [Rust's FFI][rust-ffi] or [Go's cgo][golang-cgo]. | 
|  |  | 
|  | In some case, the inputs being provided by the fuzzer may be transformed before being acted on. | 
|  | This can greatly reduce the fuzzer's ability to associate inputs with the behaviors they produce. | 
|  | For example, a library being fuzzed my first decompress its inputs before processing them. The most | 
|  | effective way to fuzz this library is to preform mutations on uncompressed inputs, then compress | 
|  | them before invoking the library. | 
|  |  | 
|  | One way to achieve this is by adding custom mutators. These are user-provided implementations of | 
|  | an optional function defined by LLVM's [fuzzing interface][fuzzer-interface]: | 
|  |  | 
|  | ```cpp | 
|  | extern "C" size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size, | 
|  | size_t MaxSize, unsigned int Seed); | 
|  | ``` | 
|  |  | 
|  | If provided, libFuzzer will call this function with a buffer `Data` of size `MaxSize` initially | 
|  | filled with `Size` bytes of a valid input from the corpus. This function can transform this data | 
|  | before calling another LLVM [fuzzing interface][fuzzer-interface] function: | 
|  |  | 
|  | ```cpp | 
|  | size_t LLVMFuzzerMutate(uint8_t *Data, size_t Size, size_t MaxSize); | 
|  | ``` | 
|  |  | 
|  | This function performs the actual mutation to create a new input. | 
|  |  | 
|  | In the example of the library that requires compressed inputs, a custom mutator could decompress the | 
|  | input from the corpus, call `LLVMFuzzerMutate` on it to create a new input, and compress the result. | 
|  |  | 
|  | Google's public fuzzing documentation has detailed examples for this case and others. See | 
|  | [Structure-Aware Fuzzing with libFuzzer][structure-aware-fuzzing]. | 
|  |  | 
|  | ### Provide a dictionary {#dictionary} | 
|  |  | 
|  | A dictionary is a set of tokens commonly found in an interface's valid inputs. When provided, a | 
|  | fuzzer will use a dictionary to construct inputs that are more likely to be valid and therefore | 
|  | provide deeper coverage. | 
|  |  | 
|  | If you know what sort of tokens your code expects, you can add them to a dictionary file, one per | 
|  | line. Then provide the file to the fuzzer using the [fuzzer GN template](build-a-fuzzer.md#fuzzer). | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ``` | 
|  | cpp_fuzzer("my_fuzzer"){ | 
|  | sources = [ "my-fuzzer.cc" ] | 
|  | deps = [ ":my-library" ] | 
|  | dictionary = "relative/path/to/dictionary" | 
|  | } | 
|  | ``` | 
|  |  | 
|  | `fx fuzz analyze` will display a recommended dictionary based on its observations. | 
|  |  | 
|  | ## Improve fuzzer performance | 
|  |  | 
|  | Once a fuzzer is achieving good coverage, then another key metric is simply how many iterations it | 
|  | can perform for a given finite set of compute resources. Fuzzers and the code they test can have a | 
|  | considerable range of complexity, so there isn't a single number of iterations per second a fuzzer | 
|  | should perform or a single memory limit a fuzzer should stay below. But if everything else is the | 
|  | same, a faster, leaner fuzzer will have a greater likelihood of finding defects over a slower, more | 
|  | memory-intensive one. | 
|  |  | 
|  | ### Startup initialization | 
|  |  | 
|  | Often, code under test requires some amount of expensive set up before it can be tested. Performing | 
|  | this initialization on every iteration can drag down the fuzzer's speed. In these situations it can | 
|  | be better to lazily initialize a variable with a lifetime equal to that of the process. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```cpp | 
|  | bool SetUp() { | 
|  | DoSomeExpensiveWork(); | 
|  | return true; | 
|  | } | 
|  |  | 
|  | extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | 
|  | static bool ready = SetUp(); | 
|  | ... | 
|  | } | 
|  | ``` | 
|  |  | 
|  | However, be _very_ careful about program state that is carried over from one iteration to the | 
|  | next. If a defect depends not only on the current test input, but also some subset of all previous | 
|  | inputs, it can become very difficult to reproduce without replaying the entire fuzzer run. | 
|  |  | 
|  | ### Pre-allocate storage | 
|  |  | 
|  | Some code operates on large amounts of memory, such as compression algorithms. A fuzzer that tries | 
|  | to allocate and free many megabytes of memory on each iteration will see degraded performance. An | 
|  | alternate approach is to use pre-allocate a maximally-sized buffer. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```cpp | 
|  | static const size_t kMaxOutSize = 0x10000000; // 256 MB | 
|  |  | 
|  | extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | 
|  | static uint8_t out_buf[kMaxOutSize]; | 
|  | size_t max = Decompressor::GetMaxUncompressedSize(size); | 
|  | if (sizeof(out_buf) < max) { | 
|  | return 0; | 
|  | } | 
|  | Decompressor::Decompress(data, size, out_buf, max); | 
|  | return 0; | 
|  | } | 
|  | ``` | 
|  |  | 
|  | As written, this introduces a trade-off between performance and sanitizer accuracy. If the code | 
|  | above instead allocated a memory region of length `max`, [AddressSanitizer][asan] would be able to | 
|  | detect if `Decompress` overflowed by any amount. With a pre-allocated region, it may silently | 
|  | succeed. Fortunately, [AddressSanitizer][asan] provides a way to [manually poison][manual-poison] | 
|  | memory. | 
|  |  | 
|  | For example: | 
|  |  | 
|  | ```cpp | 
|  | #include <sanitizer/asan_interface.h> | 
|  |  | 
|  | static const size_t kMaxOutSize = 0x10000000; // 256 MB | 
|  |  | 
|  | extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | 
|  | static uint8_t out_buf[kMaxOutSize]; | 
|  | size_t max = Decompressor::GetMaxUncompressedSize(size); | 
|  | if (sizeof(out_buf) < max) { | 
|  | return 0; | 
|  | } | 
|  | ASAN_POISON_MEMORY_REGION(&data[max], sizeof(out_buf) - max); | 
|  | Decompressor::Decompress(data, size, out_buf, max); | 
|  | ASAN_UNPOISON_MEMORY_REGION(&data[max], sizeof(out_buf) - max); | 
|  | return 0; | 
|  | } | 
|  | ``` | 
|  |  | 
|  | [asan]: https://clang.llvm.org/docs/AddressSanitizer.html | 
|  | [fuzzer-interface]: https://github.com/llvm/llvm-project/blob/HEAD/compiler-rt/lib/fuzzer/FuzzerInterface.h | 
|  | [golang-cgo]: https://golang.org/cmd/cgo/ | 
|  | [manual-poison]: https://github.com/google/sanitizers/wiki/AddressSanitizerManualPoisoning | 
|  | [one-way-function]: https://en.wikipedia.org/wiki/One-way_function | 
|  | [rust-ffi]: https://doc.rust-lang.org/nomicon/ffi.html | 
|  | [structure-aware-fuzzing]: https://github.com/google/fuzzing/blob/HEAD/docs/structure-aware-fuzzing.md |