Skip to content

A multi-lingual program repair benchmark set based on the Quixey Challenge

License

Notifications You must be signed in to change notification settings

jkoppel/QuixBugs

Repository files navigation

QuixBugs Benchmark CI Status

The QuixBugs benchmark consists of 40 programs from the Quixey Challenge translated into both Python and Java. Each contains a one-line defect, along with passing (when possible) and failing testcases. Defects fall into one of 14 defect classes. Corrected Python programs are also supplied. Quixbugs is intended for investigating cross-language performance by multi-lingual program repair tools.

For more details, see the "QuixBugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge". Researchers at KTH have run 5 repair systems on the Java version of Quixbugs programs, see "A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark".

Background: Quixey Challenge

From 2011 to 2013, mobile app search startup Quixey ran a challenge in which programmers were given an implementation of a classic algorithm with a bug on a single line, and had one minute to supply a fix. Success entailed $100 and a possible interview. These programs were developed as challenges for humans by people unaware of program repair.

Installation & Usage

Simply clone the repo.

git clone https://github.com/jkoppel/QuixBugs

The Java programs are already compiled (see *.class files in java_programs). Note the all java programs are in the same package called java_programs. The utility class JavaDeserialization.java requires you to download the external library Gson.

All Python is written in Python3.

To run both defective versions of a program against their tests, as well as the corrected Python version, use the test driver:

python3 tester.py program_name

Output is printed for visual comparison.

Using JUnit tests

There are JUnit tests in the java_testcases/junit folder for the Java version. Running TestsGenerator.java can regenerate them if needed.

To run these tests, you can use Gradle tasks provided by the build.gradle file. First, install Gradle. Then,

  • gradle test can be used to run tests on the buggy programs (Runs JUnit tests from the java_testcases/junit folder);
  • gradle crtTest can be used to run tests on the correct programs (Runs JUnit tests from the java_testcases/junit/crt_program folder).

It is also possible to run tests for a single program with the --tests option:

$ gradle test --tests KNAPSACK_TEST

> Task :test

java_testcases.junit.KNAPSACK_TEST > test_1 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:14

java_testcases.junit.KNAPSACK_TEST > test_3 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:26

java_testcases.junit.KNAPSACK_TEST > test_4 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:32

java_testcases.junit.KNAPSACK_TEST > test_5 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:38

java_testcases.junit.KNAPSACK_TEST > test_6 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:44

java_testcases.junit.KNAPSACK_TEST > test_7 FAILED
    java.lang.AssertionError at KNAPSACK_TEST.java:50

10 tests completed, 6 failed
$ gradle crtTest --tests KNAPSACK_TEST

BUILD SUCCESSFUL in 4s

Using pytest tests

For the Python version, there are pytest tests for each program in the python_testcases folder. To run them, install pytest using pip and then, from the root of the repository, call pytest to run tests for a single program or target the whole directory to run every test inside it.

pip install pytest
pytest python_testcases/test_quicksort.py
# Or
pytest python_testcases

Tests work for both buggy and correct versions of programs. The default test calls the buggy version, but there is a custom --correct flag that uses the correct version of a program.

pytest --correct python_testcases

Most of the tests run fast and finish in less than a second, but two tests are slow. The first one is the last test case of the knapsack program, and the second one is the fourth test case of the levenshtein program. The default behavior skips both these tests. For the knapsack test case, using the --runslow pytest option will include it in the running tests. However, the levenshtein test case is always skipped since it takes a long time to pass and is ignored by the JUnit tests as well.

$ pytest --correct --runslow python_testcases/test_knapsack.py

collected 10 items
python_testcases/test_knapsack.py ..........     [100%]

========== 10 passed in 240.97s (0:04:00) ========== 
$ pytest --correct python_testcases/test_knapsack.py

collected 10 items
python_testcases/test_knapsack.py ..........     [100%]

========== 9 passed, 1 skipped in 0.08s ========== 

Some tests, such as the bitcount ones, need a timeout. pytest itself doesn't have a timeout mechanism, but there is a pytest-timeout plugin for it. Installing pytest-timeout adds additional options to the pytest CLI so, for example, to timeout bitcount tests after five seconds, you can do like this:

pip install pytest-timeout
pytest --timeout=5 python_testcases/test_bitcount.py

Make sure to check pytest-timeout's documentation to understand its caveats and how it handles timeouts on different systems.

There is also a pytest-xdist plugin that runs tests in parallel and can be used similarly to the timeout plugin.

Structure & Details

The root folder holds the test driver. It deserializes the JSON testcases for a selected program, then runs them against the defective versions located in java_programs/ and python_programs/. The exception is graph-based programs, for which the testcases are located in the same folder as the corresponding program (they are still run with the test driver in the same manner).

For reference, corrected versions of the Python programs are in correct_python_programs/.

Programs include:

  • bitcount
  • breadth_first_search*
  • bucketsort
  • depth_first_search*
  • detect_cycle*
  • find_first_in_sorted
  • find_in_sorted
  • flatten
  • gcd
  • get_factors
  • hanoi
  • is_valid_parenthesization
  • kheapsort
  • knapsack
  • kth
  • lcs_length
  • levenshtein
  • lis
  • longest_common_subsequence
  • max_sublist_sum
  • mergesort
  • minimum_spanning_tree*
  • next_palindrome
  • next_permutation
  • pascal
  • possible_change
  • powerset
  • quicksort
  • reverse_linked_list*
  • rpn_eval
  • shortest_path_length*
  • shortest_path_lengths*
  • shortest_paths*
  • shunting_yard
  • sieve
  • sqrt
  • subsequences
  • to_base
  • topological_ordering*
  • wrap

* - graph-based algorithm

Authors

Contact Derrick Lin @drrckln, Angela Chen @angchen, or James Koppel @jkoppel for questions.

About

A multi-lingual program repair benchmark set based on the Quixey Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published