Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.
Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.6-3.9.
You can install prebuilt versions of Atheris with pip:
pip3 install atheris
These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.
Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang
on your path, installation from source is as simple as:
# Build latest release from source
pip3 install --no-binary atheris atheris
# Build development code from source
git clone https://github.com/google/atheris.git
cd atheris
pip3 install .
If you don't have clang
installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.
Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.
# Building LLVM
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" ../llvm
make -j 10 # This step is very slow
# Installing Atheris
CLANG_BIN="$(pwd)/bin/clang" pip3 install <whatever>
import atheris
with atheris.instrument_imports():
import some_library
import sys
def TestOneInput(data):
some_library.parse(data)
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.
Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:
- You can instrument the libraries you import:
This will cause instrumentation to be added to
with atheris.instrument_imports(): import foo from bar import baz
foo
andbar
, as well as any libraries they import. - Or, you can instrument individual functions:
@atheris.instrument_func def my_function(foo, bar): print("instrumented")
- Or finally, you can instrument everything:
Put this right before
atheris.instrument_all()
atheris.Setup()
. This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.
You might see this error:
ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.
You'll get this error if the first 2 calls to TestOneInput
didn't produce any
coverage events. Even if you have instrumented some Python code,
this can happen if the instrumentation isn't reached in those first 2 calls.
(For example, because you have a nontrivial TestOneInput
). You can resolve
this by adding an atheris.instrument_func
decorator to TestOneInput
,
using atheris.instrument_all()
, or moving your TestOneInput
function into an
instrumented module.
In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.
Atheris is fully supported by OSS-Fuzz, Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang.
The atheris
module provides three key functions: instrument_imports()
, Setup()
and Fuzz()
.
In your source file, import all libraries you wish to fuzz inside a with atheris.instrument_imports():
-block, like this:
# library_a will not get instrumented
import library_a
with atheris.instrument_imports():
# library_b will get instrumented
import library_b
Generally, it's best to import atheris
first and then import all other libraries inside of a with atheris.instrument_imports()
block.
Next, define a fuzzer entry point function and pass it to atheris.Setup()
along with the fuzzer's arguments (typically sys.argv
). Finally, call atheris.Fuzz()
to start fuzzing. You must call atheris.Setup()
before atheris.Fuzz()
.
include
: A list of fully-qualified module names that shall be instrumented.exclude
: A list of fully-qualified module names that shall NOT be instrumented.
This should be used together with a with
-statement. All modules imported in
said statement will be instrumented. However, because Python imports all modules
only once, this cannot be used to instrument any previously imported module,
including modules required by Atheris. To add coverage to those modules, use
instrument_all()
instead.
A full list of unsupported modules can be retrieved as follows:
import sys
import atheris
print(sys.modules.keys())
func
: The function to instrument.
This will instrument the specified Python function and then return func
. This
is typically used as a decorator, but can be used to instrument individual
functions too. Note that the func
is instrumented in-place, so this will
affect all call points of the function.
This cannot be called on a bound method - call it on the unbound version.
This will scan over all objects in the interpreter and call instrument_func
on
every Python function. This works even on core Python interpreter functions,
something which instrument_imports
cannot do.
This function is experimental.
args
: A list of strings: the process arguments to pass to the fuzzer, typicallysys.argv
. This argument list may be modified in-place, to remove arguments consumed by the fuzzer. See the LibFuzzer docs for a list of such options.test_one_input
: your fuzzer's entry point. Must take a singlebytes
argument. This will be repeatedly invoked with a single bytes container.internal_libfuzzer
: Indicates whether libfuzzer will be provided by atheris or by an external library (see using_sanitizers.md). If unspecified, Atheris will determine this automatically. If fuzzing pure Python, leave this asTrue
.
This starts the fuzzer. You must have called Setup()
before calling this function. This function does not return.
In many cases Setup()
and Fuzz()
could be combined into a single function, but they are
separated because you may want the fuzzer to consume the command-line arguments it handles
before passing any remaining arguments to another setup function.
Often, a bytes
object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.
You can construct the FuzzedDataProvider with:
fdp = atheris.FuzzedDataProvider(input_bytes)
The FuzzedDataProvider then supports the following functions:
def ConsumeBytes(count: int)
Consume count
bytes.
def ConsumeUnicode(count: int)
Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.
def ConsumeUnicodeNoSurrogates(count: int)
Consume unicode characters, but never generate surrogate pair characters.
def ConsumeString(count: int)
Alias for ConsumeBytes
in Python 2, or ConsumeUnicode
in Python 3.
def ConsumeInt(int: bytes)
Consume a signed integer of the specified size (when written in two's complement notation).
def ConsumeUInt(int: bytes)
Consume an unsigned integer of the specified size.
def ConsumeIntInRange(min: int, max: int)
Consume an integer in the range [min
, max
].
def ConsumeIntList(count: int, bytes: int)
Consume a list of count
integers of size
bytes.
def ConsumeIntListInRange(count: int, min: int, max: int)
Consume a list of count
integers in the range [min
, max
].
def ConsumeFloat()
Consume an arbitrary floating-point value. Might produce weird values like NaN
and Inf
.
def ConsumeRegularFloat()
Consume an arbitrary numeric floating-point value; never produces a special type like NaN
or Inf
.
def ConsumeProbability()
Consume a floating-point value in the range [0, 1].
def ConsumeFloatInRange(min: float, max: float)
Consume a floating-point value in the range [min
, max
].
def ConsumeFloatList(count: int)
Consume a list of count
arbitrary floating-point values. Might produce weird values like NaN
and Inf
.
def ConsumeRegularFloatList(count: int)
Consume a list of count
arbitrary numeric floating-point values; never produces special types like NaN
or Inf
.
def ConsumeProbabilityList(count: int)
Consume a list of count
floats in the range [0, 1].
def ConsumeFloatListInRange(count: int, min: float, max: float)
Consume a list of count
floats in the range [min
, max
]
def PickValueInList(l: list)
Given a list, pick a random value
def ConsumeBool()
Consume either True
or False
.