This repository contains the H-ARC dataset and preliminary analyses reported in our paper H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark.
Participant responses, natural language descriptions, errors and state space graphs can all be explored visually on our project webpage.
H-ARC consists of action by action traces of humans solving ARC tasks from the both the training and evaluation sets using an interface and setup similar to François Chollet's initial proposal. The original dataset can be found here.
@article{legris2024harcrobustestimatehuman,
title={H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark},
author={Solim LeGris and Wai Keen Vong and Brenden M. Lake and Todd M. Gureckis},
year={2024},
journal={arXiv preprint arxiv:2409.01374}
url={https://arxiv.org/abs/2409.01374},
}
-
Ensure you have Python 3.10 or later installed on your system.
-
Clone this repository to your local machine:
gh repo clone le-gris/h-arc cd h-arc
-
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS and Linux:
source .venv/bin/activate
- On Windows:
-
Install the required packages using pip and the requirements.txt file:
pip install -r requirements.txt
The H-ARC dataset is provided as a zip archive in the data
folder. To extract it:
-
Navigate to the project root directory if you're not already there.
-
Use the following command to extract the dataset:
- On Windows:
tar -xf data/h-arc.zip
- On macOS and Linux:
unzip data/h-arc.zip
- On Windows:
After extraction, you should see several CSV files in the data
folder.
The H-ARC dataset consists of several CSV files containing different aspects of human performance on ARC tasks.
All files are in CSV format. The main files include:
clean_data.csv
/clean_data_incomplete.csv
: All collected data from complete / incomplete participant dataclean_errors.csv
/clean_errors_incomplete.csv
: All unique errors on each task and their counts from complete/incomplete participant dataclean_summary_data.csv
/clean_summary_data_incomplete.csv
: Attempt by attempt summary data for complete/incomplete participant dataclean_feedback_data.csv
: Participant feedbackclean_demographics_data.csv
: Demographic informationclean_withdraw_data.csv
: Withdrawal information
For more detailed information about the dataset, see Dataset description.
We include in this repository the main Jupyter notebooks used to compute reported results from our paper.
This notebook looks at some aspects of the ARC dataset structure.
This notebook computes basic performance metrics on the H-ARC dataset, including overall solve rates, action counts, and time-related statistics for both training and evaluation tasks.
This notebook looks at some basic demographics data from our pool of participants.
This notebook contains miscellaneous analyses, including participant counts for different experimental conditions and various data processing steps.
This notebook analyzes error patterns in participant responses, including copy errors and other common mistake types across both training and evaluation tasks.
This notebook examines learning effects across tasks using mixed-effects logistic regression models. It analyzes how task success rates change as participants progress through the experiment.
This notebook focuses on analyzing incomplete task attempts, comparing performance metrics between participants who completed all tasks and those who didn't, and examining factors that might contribute to task incompletion.
This notebook compares the performance of human participants with that of algorithmic solutions to evaluation set ARC tasks. It analyzes success rates, error patterns, and solution strategies between humans and AI systems.
Follow these steps to process a Kaggle submission file. This will faciliate downstream human-machine comparisons. Here we use the "Claude-3.5 (Baseline)" approach from the ARC Prize leaderboard as an example.
-
Create the necessary directories:
mkdir -p data/kaggle_solutions/claude3_5-langchain
-
Visit the following webpage: Claude 3.5 Langchain ARC Submission
-
Download the
submission.json
file from the webpage into thedata/kaggle_solutions/claude3_5-langchain
directory. -
Run the
kaggle_submision_to_csv.py
script with the appropriate submission ID:python src/kaggle_submision_to_csv.py --submission_id claude3_5-langchain
This will process the JSON file and create a CSV file in the same directory with a similar format to our human data.
This dataset is licensed under the Creative Commons Attribution 4.0 International License.