Advanced fuzzing unmasks elusive vulnerabilities

Back

Fuzzing Vulnerability Disclosure Best Practices

2023-10-16 • 9 minute read

Marc Heuse @marcheuse

Hacker

Introduction

Fuzz testing is a main component of modern software assurance, but some bugs remain elusive to fuzzing.

Google’s libwebp, integral to Chrome and Safari, recently received attention for a severe security flaw that was found to be exploited in the wild. Notably, the bug was not flagged by Google’s expansive fuzzing setup, including OSS-Fuzz.

As Ben Hawkes from Isosceles details, the bug resides in WebP’s lossless compression method, VP8L. Memory allocations based on pre-calculated fixed buffer sizes resulted in a heap buffer overflow.

Fuzzers are usually able to uncover such complex memory corruptions given enough time and solvable path constraints. However, this specific bugs escaped previous fuzzing attempts.

The question the fuzzing community and we were asking – is it possible to find this specific vulnerability with fuzzing? And if so, why was it not found in Google‘s OSS-Fuzz initiative? This article attempts to answer these questions and also tries to give guidance to better fuzzing campaigns.

:bulb: Note In the first iteration of this article it was shown that the fuzzing setup found the libwebp vulnerability; however, this was due to an error where an already partial corrupted input file was accidentally added as a seed during the campaign.

Step 1: Fuzzing Prep

As the lead developer behind AFL++, I wanted to understand why certain bugs manage to escape automated tools and how we can refine the tools for better bug coverage.

If you are new to fuzzing, you might want to read up on AFL++ first:

Installing AFL++ is straightforward. See the installstion instructions if you have not installed a full modern LLVM compiler and other necessary build tools.

git clone https://github.com/AFLplusplus/AFLplusplus
make -C AFLplusplus
sudo make -C AFLplusplus install

To test, we used the libwebp version just before the bug was fixed:

git clone https://chromium.googlesource.com/webm/libwebp
cd libwebp
git checkout 7ba44f80f3b94fc0138db159afea770ef06532a0

We chose not to rely on the pre-existing harnesses or constructing a new, and instead use the example tool that comes with the library because we knew that this can be used to trigger the bug. Hypothetically, these harnesses could have a narrow scope for fuzzing and be the reason the bug wasn’t found.

To optimize for speed, we moved the forkserver within the dwebp code, just before the input is parsed. Finally, we adjusted the compiler in makefile.unix to AFL++.

diff --git a/examples/dwebp.c b/examples/dwebp.c
index 652de6a6..d41df43d 100644
--- a/examples/dwebp.c
+++ b/examples/dwebp.c
@@ -312,6 +312,10 @@ int main(int argc, const char* argv[]) {
if (quiet) verbose = 0;
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+  __AFL_INIT();
+#endif
+
  {
    VP8StatusCode status= VP8_STATUS_OK;
    size_t data_size = 0;
diff --git a/makefile.unix b/makefile.unix
index 857ea988..83d04a9c 100644
--- a/makefile.unix
+++ b/makefile.unix
@@ -113,7 +113,7 @@ else
  CFLAGS = -O3 -DNDEBUG
endif
CFLAGS += $(EXTRA_FLAGS)
-CC = gcc
+CC = afl-clang-fast
INSTALL = install
GROFF =/usr/bin/groff
COL = /usr/bin/col

Step 2: Focusing on the Vulnerable Code Path

We wanted AFL++ to focus solely on the code path that lead to this specific bug. For that, we used source code coverage compilation flags -fprofile-arcs -ftest-coverage -lgcov --coverage. Then, we ran the target examples/dwebp with the problematic bad.webp file:

examples/dwebp bad.webp -o /dev/null

Since a crash does not generate a coverage file, we prior also modified the source code to force the writing of one:

diff --git a/src/dec/vp8l_dec.c b/src/dec/vp8l_dec.c
index 45012162..836540fa 100644
--- a/src/dec/vp8l_dec.c
+++ b/src/dec/vp8l_dec.c
@@ -360,6 +360,8 @@ static int ReadHuffmanCode(intalphabet_size, VP8LDecoder* const dec,
  return size;
}
+void __gcov_dump(void);
+
static intReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
                            int color_cache_bits, int allow_recursion) {
  int i, j;
@@ -378,6 +380,8 @@ static int ReadHuffmanCodes
(VP8LDecoder*const dec, int xsize, int ysize,
  int* mapping = NULL;
  int ok = 0;
+  __gcov_dump();
+
  if (allow_recursion&& VP8LReadBits(br, 1)) {
    // use meta Huffmancodes.
    const int huffman_precision = VP8LReadBits(br, 3) + 2

After forcing the coverage file to be written, we used the utility gcov to parse and analyze the generated coverage file. We identified the function names that are involved in the crash, to instrument only those specific functions to guide the fuzzer towards the vulnerability more efficiently.

Doing this can significantly speed up the fuzzing process for the goal. While the fuzzer could eventually find the bug without this focus, directing it will save a considerable amount of time in this specific experiment.

We created a file called libwebp.list that contains the names of the functions to be instrumented, which are relevant to the crash path:

fun: CheckSizeArgumentsOverflow
fun: CheckSizeOverflow
fun: DefaultFeatures
fun: GetCPUInfo
fun: GetLE16
fun: GetLE32
fun: main
fun: PrintAnimationWarning
fun: ShiftBytes
fun: VP8InitIo
fun: VP8InitIoInternal
fun: VP8LCheckSignature
fun: VP8LDecodeHeader
fun: VP8LIsEndOfStream
fun: VP8LPrefetchBits
fun: VP8LReadBits
fun: WebPInitCustomIo
fun: WebPInitDecBuffer
fun: WebPInitDecBufferInternal
fun: WebPInitDecoderConfig
fun: WebPMalloc
fun: WebPResetDecParams
fun: WebPSafeCalloc
fun: WebPSafeMalloc
fun: x86CPUInfo

Step 3: Preparing the code for fuzzing

For a thorough fuzzing campaign we want fuzzing instances that have a different view on coverage. For this, we compile additional instances with Context Sensitive Coverage (CTX) and a N-Sequence based instrumentation with a step length of 8 (NGRAM-8).

Additionally we want fuzzing instances that help solving path constraints - for this we compile additional instances with CMPLOG and COMPCOV.

Finally we add an instance compiled for detecting any kind of memory corruption easily by enabling ASAN.

export AFL_LLVM_ALLOWLIST=/full/path/to/libwebp.list
for i in cmplog compcov ctx ngram asan; do
    cp -r libwebp libwebp-$i
done
make -C libwebp -f makefile.unix
make AFL_USE_ASAN=1 -C libwebp-asan -f makefile.unix
make AFL_LLVM_CMPLOG=1 -C libwebp-cmplog -f makefile.unix
make AFL_LLVM_LAF_ALL=1 -C libwebp-compcov -f makefile.unix
make AFL_LLVM_INSTRUMENT=CTX -C libwebp-ctx -f makefile.unix
make AFL_LLVM_INSTRUMENT=NGRAM-8 -C libwebp-ngram -f makefile.unix

In a real-world fuzzing campaign you would add a honggfuzz instance and as well as a libfuzzer instance with the "value-profile" and likely also a libafl and a concolic execution fuzzers like symcc and .TritonDSE. This is however omitted here not make this article overly complex.

Step 4: Launching the fuzzing campaign

Initially, we gathered a few seed files from libwebp and a few random files we downloaded, and placed them in a directory named seeds. To ensure the fuzzer had a diverse set of inputs, we ran the deduplication command:

afl-cmin -i seeds -o inputs
# [... output omitted ...]
ls -l inputs/
-rw-r--r-- 2 marc 1013  50408 Okt  6 14:15 file_example_WEBP_50kB.webp
-rw-r--r-- 2 marc 1013  71272 Okt  6 14:15 Huey_helicopter_USNS.webp
-rw-rw-r-- 9 marc 1013   4880 Okt  6 14:15 test.webp
-rw-rw-r-- 9 marc 1013 1321542 Okt  6 14:15 test_webp_wasm.webp
-rw-r--r-- 2 marc 1013  81150 Okt  6 14:15 foobar.webp

With five unique seed files, we were ready to start fuzzing. We set up 8 different screen sessions, each with different configurations, aiming for a diverse fuzzing setup.

# A simple MAIN instance
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-main -M main -- libwebp/examples/dwebp @@ -o /dev/null
# Our ASAN instance to help finding non-crashing memory corruptions
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-asan -S asan -- libwebp-asan/examples/dwebp @@ -o /dev/null
# Our path constraint solving instances
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-cmplog2 -S cmplog2 -l2at -c libwebp-cmplog/examples/dwebp -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-cmplog3 -S cmplog3 -l3a -c libwebp-cmplog/examples/dwebp -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-compcov -S compcov -- libwebp-compcov/examples/dwebp @@ -o /dev/null
# Alternative coverage gathering
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-ngram -S ngram -- libwebp-ngram/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-ctx -S ctx -- libwebp-ctx/examples/dwebp @@ -o /dev/null
# Various instances that fuzz a bit differently to add to the diversity
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-default -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-binexploit -S binexploit -a binary -P exploit -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-binexplore -S binexplore -a binary -P explore -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-ascexploit -S ascexploit -a ascii -P exploit -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-ascexplore -S ascexplore -a ascii -P explore -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-genericexploit -S genericexploit -P exploit -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-genericexplore -S genericexplore -P explore -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-rare -S rare -p rare -- libwebp/examples/dwebp @@ -o /dev/null
screen -dm /data/marc-webp/afl-env.sh afl-fuzz -i input -o output-Z -S Z -Z -- libwebp/examples/dwebp @@ -o /dev/null

Clearly, a profound understanding and experience with the AFL++ capabilities is a requisite. However, for those keen on delving deeper, this document offers thorough insights.

To standardize the environment for all fuzzing instances, we created an afl-env.sh script with default environment variables:‍

#!/bin/bash
export AFL_FAST_CAL=1 AFL_IMPORT_FIRST=1 AFL_DISABLE_TRIM=1 AFL_AUTORESUME=1
$*

Step 5: Profit?

After 3 days, we reviewed the results and found no crash. Only when an input is added that has already 2/3 of the required specially set-up tables the vulnerability is found, but then it still needs about 40 hours to trigger.

Analysing the exact table setup that is need for this vulnerability reveals why:

// Size of huffman_tables buffer = 654 + 630 + 630 + 630 + 410 = 2954 elements
// To overflow we "just" exceed this number!
static uint32_t code_lengths_counts[5] = {
    //  1  2  3  4  5  6  7  8  9  10   11  12 13 14 15
    {0, 1, 1, 0, 0, 0, 0, 0, 0, 3, 229, 41,  1, 1, 1, 2},   // size = 654
    {0, 1, 1, 0, 0, 0, 0, 0, 0, 7, 241,  1,  1, 1, 1, 2},   // size = 630
    {0, 1, 1, 0, 0, 0, 0, 0, 0, 7, 241,  1,  1, 1, 1, 2},   // size = 630
    {0, 1, 1, 0, 0, 0, 0, 0, 0, 7, 241,  1,  1, 1, 1, 2},   // size = 630
    {0, 1, 1, 1, 1, 1, 0, 0, 0, 11, 5,   1, 10, 4, 2, 2},   // size = 414
};

The fuzzer would need to find a very specific combination of bytes to create a table that just has the correct size to trigger this bug. Below the number there is no crash, above the number the invalid table is detected and processing is aborted.

If there would be a program flow specific to a crash, this is something a fuzzer is very good at finding. If there would be a specific value required that is common (e.g. 0xffffffff) then a fuzzer is very good at finding this as well.

However finding a long list of specific values where many of these are not 0, 1, or 255 is a challenge, the more of these values have to be found without any help available from path constraint solving or coverage, or dynamically learning values at runtime from the target is just not possible.

In this specific case the fuzzer would have needed to guess about 20 bytes with very specific values without any help, and that would take thousands of years (figure of speech, it would take longer) until it would randomly find such a setup.

Also other mechanisms like concolic solvers are unable to find such issues. Likely only formal verification methods to proof memory safety would likely showed this vulnerability, but this is not something that can be easily setup up and applied.

Lesson Learned

The technique of only instrumenting the the execution path of a vulnerability to see if fuzzing can find it - or why it fails is very valuable to get this information in a short amount of time. Without the optimization performed in Step 2, it would have taken around 1 core year to come to the same conclusion.

This libwebp vulnerability is an important reminder to understand that fuzzing is no silver bullet. Some bugs are impossible to find with fuzzing, others bugs need triggers that a fuzzer would not produce, or a large amount of data.

And this is not the first time high profile vulnerabilities were not found by fuzzers - and could not have been.

Sometimes it is because of harness limitations, like in the infamous heartbeet vulnerability.

Sometimes it is because the vulnerability type is not covered by a fuzzer, like the infamous Shellshock vulnerability

And somtimes it is because is just not able to trigger the bug condition like in this libwebp vulnerability or e.g. in a memory corruption in certificate parsing with punycode in OpenSSL.

Manual code inspection, static analysis tools, automated testing and hands-on security tests together are required to have a comprehensive approach to identify the vulnerabilities in software. Just relying on fuzzing is setting yourself up for failure.

Evaluating OSS-Fuzz's Discovery Gap

OSS-Fuzz had not discovered the vulnerability despite the presence of harnesses within the tests/fuzzer/subdirectory of the libwebp project. We found that two of the seven harnesses (simple_api_fuzzer and advanced_api_fuzzer) could indeed trigger the crash with bad.webp:

libwebp/tests/fuzzer/simple_api_fuzzer bad.webp
Reading 236 bytes from bad.webp
=================================================================
==13217==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x626000002f28 at pc 0x555555701767 bp0x7fffffffd1f0 sp 0x7fffffffd1e8
[…]
SUMMARY: AddressSanitizer: heap-buffer-overflow/data/marc-
webp/libwebp/src/utils/huffman_utils.c:59:18 in ReplicateValue

It simply had no chance to find the specific setup required to trigger the bug, so this is not OSS-Fuzz fault.

However we found a few issues with how OSS-Fuzz performs its fuzzing which prevents fuzz testing at full potential:

Static Instrumentation: Originally, I had implemented a feature for the AFL++ integration into OSS-Fuzz that would randomly instrument the code differently each time it was compiled. This increased probability of reaching new coverage and subsequent bugs. However, due to complications, this feature was removed by Google‘s team, limiting the fuzzing to a standard set of instrumentation options. This limits the views on coverage and path constraints in OSS-Fuzz.
Corpus Synchronization Issues: OSS-Fuzz uses ClusterFuzz to handle the actual fuzzing workload. An instance works independently, and it’s results are afterwards merged back into the main corpus. This method used for merging is libfuzzer which sees much less coverage than AFL++, especially AFL++ features like COMPCOV, and hence it loses 10-20% of the coverage AFL++ finds each time.
Finally OSS-Fuzz runs ClusterFuzz in a CI-style fashion, running single instances for a few hours only and then terminating. With a large corpus or a slow target the startup time for intelligent fuzzers that first perform a calibration is huge and leaves little time for actual fuzzing.

Some bugs can not be effectively found with CI based fuzzing, and instead need a long running fuzzing campaign, using different techniques to solve path constraints: CMPLOG, COMPCOV, libfuzzer's value profile and in small and medium projects one or two concolic execution frameworks, why also using different coverage options like CTX and/or NGRAM, and running multiple different instances in parallel.