Fuzz testing for various parts of the Lua interpreter, mostly for use as a test-case generator for alternate Lua implementations (e.g. Zua). Uses libFuzzer via the Clang compiler. A pre-generated corpus and corresponding outputs can be found on the releases page for quick use as a data set for testing other Lua implementations.
For a writeup of how this was used to test Zua's lexer implementation, see:
Currently supports:
- Lua 5.1.5
- Lexer (llex.c)
- Make sure you have CMake, Clang, and Clang++ installed.
- Run
./setup.sh
to generatebuild.fuzz
,build.cov
, andbuild.tools
directories via CMake.
First, cd build.fuzz
, then, from within that directory:
- To run the fuzzer:
make fuzz_llex_run
- To minimize the fuzzer's corpus:
make fuzz_llex_minimize
The lexer fuzzer has an accompanying tool to output the tokens generated when lexing each file in the corpus. This set of result files can then be used to test alternate Lua lexer implementations to ensure that they generate the exact same set of tokens.
The format of the output file is a space-separated list of tokens, printed via luaX_token2str
. For example, if the input file was:
local hello = 'world'
then the output file would be:
local <name> = <string> <eof>
If there is an error while lexing, then the error appears on the second line of the file. For example, if the last '
in the 'world'
string in the previous example were omitted, then the output would look like:
local <name> =
[string "fuzz"]:3: unfinished string near '<eof>'
First, run the fuzzer (see Fuzzing above) to create a comprehensive corpus. Then, to generate the output files for each file in the corpus:
cd build.tools
make fuzz_llex_output_run
The files will be in build.tools/tools/output/fuzz_llex/
and the file names will correspond with the file names of the corpus input files.
Given a corpus of input files containing valid unparsed string literals, this tool can generate a corresponding set of output files containing the parsed version of the string. These pairs of input/output files can then be used to test the string parsing of alternate Lua implementations.
For example, given the following input file:
"\\\thello world"
the output file would be:
\ hello world
A comprehensive set of inputs are included in this repository, so generating the outputs is as simple as:
cd build.tools
make fuzz_strings_output_run
Because the PUC Lua lexer both lexes and parses strings at the same time, it's a bit hard to extract unparsed strings while lexing. Instead, Zua was used along with a corpus generated by fuzz_llex
to extract all valid strings from the entire corpus and put each into its own file. Then, that set of strings was minimized by running them back through fuzz_llex
with -merge=1
. Specifically, the following was done:
To generate an un-minimized set of inputs with Zua:
- Clone zua
zig build fuzz_strings_gen_run
Once you have a un-minimized set of inputs, you can minimize it using fuzz_llex
by doing the following:
- Copy the generated files to
fuzzing-lua/build.fuzz/corpus/fuzz_strings_generated
cd fuzzing-lua/build.fuzz
make fuzz_llex
mkdir corpus/fuzz_strings
./fuzz_llex corpus/fuzz_strings corpus/fuzz_strings_generated -merge=1
Your minimized input set will then exist in fuzzing-lua/build.fuzz/corpus/fuzz_strings
.
This is currently untested. TODO: Adapt this guide.
cd build.tools
make package_data