Fuzz Introspector: enabling rapid fuzz introspection tool development

14th Febuary, 2025

David Korczynski,

Security Research & Security Engineering

In this blog post we discuss fuzzing introspection by way of a Python library to enable quick prototyping and tooling development to support the fuzzing workflow. The work we will discuss in this blog are contributions made by the Ada Logics team to the OpenSSF Fuzz Introspector project.

In the blog post we will present two tools that are each a few lines of Python code but perform powerful analyses highlighting interesting targets to fuzz in an arbitrary C code base.

Fuzzing introspection and compiler-based analysis

Fuzzing is a central technique in software security as a way of finding security and reliability issues. It's a key component of vulnerability finding, and in particular in memory unsafe languages, and is a proven technique that has been widely used for more than a decade. However, many parts of the fuzzing workflow are still time intensive and inherently manual.

Fuzz Introspector automates parts of the fuzzing workflow and has been developed in the open source for a few years. Fuzz Introspector analyses the large-scale fuzzing efforts of OSS-Fuzz and makes the results available at https://introspector.oss-fuzz.com/. In short, Fuzz Introspectors goals are to assist in selecting good target functions to fuzz within a code base, extract insights based on combining code coverage analysis and static analysis, highlight limitations of existing fuzzing set ups, and more recently also to provide output that can be used by further research and analysis tooling in the fuzzing domain. The improvements in Fuzz Introspector we will show in this blog are tailored to make Fuzz Introspector an easy-to-use library to build custom program analysis tooling for analysing fuzzing set ups.

Originally, Fuzz Introspector relied on static analysis by way of compiler extensions (LLVM/LTO) to extract data about the source code under analysis. In order to extract data in this way, Fuzz Introspector needs to be able to build the target code under analysis. This comes with some limitations, e.g. it’s not always trivial to know how to build an arbitrary codebase and it’s not always easy to integrate LTO into the build process even if you know how to build the target project. The consequence is that in order to run fuzz introspector analysis on a project, there is always an initial burden of building the software, which can be time-consuming for some projects. Furthermore, some builds are not compatible with LTO at all, which makes the process even more complicated and potentially incompatible.

Moving beyond compiler-based analysis: broadening use-cases

Fuzz Introspector has recently begun to support a new strategy for collecting program analysis data by a pure static method, meaning no need for a build system or similar is required. Fuzz Introspector does this by way of tree-sitter which is a parser generator tool as well as parsing library. As such, Fuzz Introspector has implemented analysis on top of tree-sitter's parsing to e.g. support control-flow analysis, type analysis and more.

There are some distinct benefits to moving in this direction with Fuzz Introspector. First, because we no longer rely on compiler-based analysis and fuzz introspector operates purely statically, the process of running fuzz introspector becomes much simpler. On one hand, we don’t need a build system, and the process of fuzz introspector is now exclusively within a Python codebase. In contrast, relying on compiler extensions requires that we first extract data by way of e.g. clang executions which makes the overall workflow more complex. Now, all is done in Python which means we have a single process to do the entire program analysis.

Second, because we now have everything done within a single Python process, we can expose the full capabilities by way of a Python Library, which makes it very easy to develop new reasoning tools and fuzz analysis tooling. For example, Fuzz Introspector now has a Python package Fuzz Introspector on PyPI so installing Fuzz Introspector is now significantly easier in comparison to previously where it was required to build a custom version of clang.

Tool 1: finding good fuzz targets in 10 lines of code

In this section we will go through how you can use Fuzz Introspector as a library to develop a tool that finds interesting functions to fuzz in a target codebase. The tool itself uses built-in analysis of Fuzz Introspector to identify functions that have a lot of reachability. The idea behind this analysis is that the functions with a lot of reachability often make up entrypoints to a given library, and will likely be exposed to untrusted input. These are good targets for fuzzing in general.

The first step is to install fuzz introspector in a virtual environment:

Python3.11 -m virtualenv .venv
. .venv/bin/activate
Python3 -m pip install fuzz-introspector==0.1.9

# validate it was installed successfully
fuzz-introspector –help

Now, we need to create a tool the uses the fuzz introspector API:

import sys import json from fuzz_introspector import commands as fi_commands def log_and_exit(msg): print(msg) sys.exit(0) # Run a fuzz introspector analysis on the target direcvtory. Indicate, # that there are no fuzzing harnesses and do not dump output files. _, report = fi_commands.analyse_end_to_end(arg_language='c', target_dir=sys.argv[1], module_only=True, dump_files=False) # Extract the Merged project profile. introspector_project = report.get('introspector-project', None) if not introspector_project: log_and_exit('Introspector analysis failed') # List good targets based of far-reach-low-coverage analysis result_dict = {} for analysis in introspector_project.optional_analyses: if analysis.name == 'FarReachLowCoverageAnalyser': result_dict = analysis.json_results if not result_dict: log_and_exit('No far reach analysis') # Print the json output print(json.dumps(result_dict['functions'], indent=2))

To run our tool, we will use a simple C library that has around 2000 lines of code. The idea behind using this library is that it's small enough to be intuitive about what to expect, and large enough to not be a complete toy example. We will use https://github.com/dvhar/dateparse. Intuitively, we would expect that there is some function such as parse_date, parse or similar in this library, that would execute a lot of the logic exposed by the library.

To analyse the library we will simply clone the repository and run our tool pointing at the downloaded folder:

git clone https://github.com/dvhar/dateparse ../ref/dateparse
pushd ../ref/dateparse
git checkout 3552f0ff88c510861cd2335f14258f4409621fe5
popd
python3 analysis.py ../ref/dateparse
[
  {
    "project": "UnknownProject",
    "function_name": "dateparse",
    "function_filename": "../ref/dateparse/dateparse.c",
    "raw_function_name": "dateparse",
    "is_reached": false,
    "is_enum_class": false,
    "cyclomatic_complexity": 7,
    "function_argument_names": [
      "datestr",
      "t",
      "offset",
      "stringlen"
    ],
    "function_arguments": [
      "char*",
      "date_t*",
      "int*",
      "int"
    ],
    "function_signature": "int dateparse(const char* datestr, date_t* t, int *offset, int stringlen) ",
    "reached_by_fuzzers": [],
    "return_type": "int",
    "runtime_coverage_percent": 0.0,
    "source_line_begin": 2187,
    "source_line_end": 2195,
    "debug_summary": "",
    "total_cyclomatic_complexity": 570
  },
  {
    "project": "UnknownProject",
    "function_name": "parseTime",
    "function_filename": "../ref/dateparse/dateparse.c",
    "raw_function_name": "parseTime",
    "is_reached": false,
    "is_enum_class": false,
    "cyclomatic_complexity": 366,
    "function_argument_names": [
      "datestr",
      "p",
      "stringlen"
    ],
    "function_arguments": [
      "char*",
      "struct parser*",
      "int"
    ],
    "function_signature": "static int parseTime(const char* datestr, struct parser* p, int stringlen) ",
    "reached_by_fuzzers": [],
    "return_type": "int",
    "runtime_coverage_percent": 0.0,
    "source_line_begin": 421,
    "source_line_end": 2016,
    "debug_summary": "",
    "total_cyclomatic_complexity": 465
  },
  {
    "project": "UnknownProject",
    "function_name": "parse",
    "function_filename": "../ref/dateparse/dateparse.c",
    "raw_function_name": "parse",
    "is_reached": false,
    "is_enum_class": false,
    "cyclomatic_complexity": 10,
    "function_argument_names": [
      "p",
      "dt",
      "offset"
    ],
    "function_arguments": [
      "struct parser*",
      "date_t*",
      "int*"
    ],
    "function_signature": "static int parse(struct parser* p, date_t* dt, int *offset) ",
    "reached_by_fuzzers": [],
    "return_type": "int",
    "runtime_coverage_percent": 0.0,
    "source_line_begin": 2169,
    "source_line_end": 2185,
    "debug_summary": "",
    "total_cyclomatic_complexity": 98
  },
…
]
    ```

The top three functions returned by our analysis are:

dateParse: source
parseTime: source
parse: parse

From our manual analysis validating the results, these are great targets for fuzzing dateparse and are likely the main targets a security researcher would focus on when fuzzing this library. Furthermore, Fuzz Introspector has extracted more analysis about each target, such as the type, cyclomatic complexity, and more, which can be useful for further program analysis tooling.

Tool 2: finding targets not analysed in an already-fuzzed project and ranking them

In this example we will look at developing a simple tool for providing useful information when matched with a project that is already being fuzzed. Specifically, Fuzz Introspector provides features for analysing static reachability of existing harnesses in a given code base. To this end, we'll develop a tool that analyses a given code base and finds the functions that are not reached by existing harnesses, and then sorts those by accumulated cyclomatic complexity. The idea behind this is that you can quickly highlight functions that are the best harnesses, in terms of having the widest reach, and that are not already covered by existing harnesses.

In this tool we won't be using any pre-existing higher-level analysis by Fuzz Introspector, but rather rely on some of the primitive data Fuzz Introspector extracts to perform our own analysis. The main point for doing this is to show an example closer to what one would write for a custom analysis tool

In this example, the workflow of is:

Run introspector analysis on the target code base;
Iterate all functions in the introspector project and save each function that is not reached by existing fuzzers.
Print the saved functions sorted by accumulated cyclomatic complexity.

import sys from fuzz_introspector import commands as fi_commands def log_and_exit(msg): print(msg) sys.exit(0) print('Performing analysis') # Run a fuzz introspector analysis on the target direcvtory. Indicate, # that there are fuzzing harnesses and dump output files. _, report = fi_commands.analyse_end_to_end(arg_language='c', target_dir=sys.argv[1], module_only=False, dump_files=True) introspector_project = report.get('introspector-project', None) if not introspector_project: log_and_exit('Introspector analysis failed') # Go through all the functions in the merged project profile and # capture those that are not reached by any of the existing harnesses. functions_of_interest = [] for function in introspector_project.proj_profile.all_functions.values(): if not function.reached_by_fuzzers: functions_of_interest.append(function) # Sort the identified functions by accumulated cyclomatic complexity # and output their name, source code and accumulated cyclomatic complexity. print('#' * 35 + ' Results ' + '#' * 35) for func in list( sorted(functions_of_interest, key=lambda x: x.total_cyclomatic_complexity, reverse=True))[:10]: print('%s : %s : %d' % (func.function_name, func.function_source_file, func.total_cyclomatic_complexity))

For this example, we will use an OSS-Fuzz project, which means an open source project with a pre-existing fuzzing set up. We will use clib which has a pre-existing fuzzing harness in test/fuzzing/fuzz_manifest.c. Running this against clib we get:

git clone https://github.com/clibs/clib ../ref/clib
cd ../ref/clib
git checkout bc327b26e669079346daaf76b7ad9e444f903a6f
cd ../../work
python3 ./analyse2.py ../ref/clib
Performing analysis
################################### Results ###################################
install_package : ../ref/clib/src/clib-upgrade.c : 554
install_local_packages : ../ref/clib/src/clib-update.c : 542
install_local_packages_with_package_name : ../ref/clib/src/clib-update.c : 538
clib_package_install_development : ../ref/clib/src/common/clib-package.c : 502
install_packages : ../ref/clib/src/common/clib-package.c : 496
clib_package_install : ../ref/clib/src/common/clib-package.c : 496
clib_package_install_executable : ../ref/clib/src/common/clib-package.c : 496
clib_package_install_dependencies : ../ref/clib/src/common/clib-package.c : 496
build_package_with_manifest_name_thread : ../ref/clib/src/clib-build.c : 366
configure_package_with_manifest_name_thread : ../ref/clib/src/clib-configure.c : 363

The above is a list of functions ranked by accumulated cyclomatic complexity that are not covered by the existing fuzzing harness in clib. We can cross-check the coverage details with the upstream OSS-Fuzz coverage report, which is available by way of the OSS-Fuzz introspection web page: https://introspector.oss-fuzz.com/project-profile?project=clib. The coverage report here

install_packages: 0% coverage
clib_package_install_development: 0% coverage
clib_package_install_dependencies: 0% coverage

Interestingly, the clib-upgrade.c file is not part of the OSS-Fuzz report, because the file is not compiled into the resulting fuzz harness. As such, the new version of Fuzz Introspector that relies purely on static analysis, therefore, captures more since it statically scans all files for analysis.

Closing thoughts

Fuzzing is a complex task that involves a lot of complex manual effort to get right. Fuzzing introspection and program analysis go hand-in-hand, and Fuzz Introspector aims to build bridges in this domain. The next phase of Fuzz Introspector is to expose these capabilities in a pure Python library, to make it easy to use and extend.

The new python-only Fuzz Introspector is tailored for rapid prototyping of program analysis tools. These developments are even more important in the context of AI-based tools. The developments in Fuzz Introspector can be utilised by AI-based program analysis tooling, which is an exciting area for the future.

Finally, this is still in the early stages, and we envision Fuzz Introspector will stabilise some of it's APIs in the new future. At the time of writing, Fuzz Introspector moves at a fast pace, so expect that some APIs may change over time.

Ada Logics offers a broad set of services in security engineering and we are expert software analysis tool builders. If you have tools or research and development opportunities, consider our security engineering services.