The AI Scientist: Towards Fully Automated
Open-Ended Scientific Discovery 🧑‍🔬

📚 [Paper] | 📝 [Blog Post] | 📂 [Drive Folder]

One of the grand challenges of artificial intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used to aid human scientists—for example, for brainstorming ideas or writing code—they still require extensive manual supervision or are heavily constrained to specific tasks.

We're excited to introduce The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently.

We provide all runs and data from our paper here, where we run each base model on each template for approximately 50 ideas. We highly recommend reading through some of the Claude papers to get a sense of the system's strengths and weaknesses. Here are some example papers generated by The AI Scientist 📝:

DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models
Multi-scale Grid Noise Adaptation: Enhancing Diffusion Models For Low-dimensional Data
GAN-Enhanced Diffusion: Boosting Sample Quality and Diversity
DualDiff: Enhancing Mode Capture in Low-dimensional Diffusion Models via Dual-expert Denoising
StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models
Adaptive Learning Rates for Transformers via Q-Learning
Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models
Grokking Accelerated: Layer-wise Learning Rates for Transformer Generalization
Grokking Through Compression: Unveiling Sudden Generalization via Minimal Description Length
Accelerating Mathematical Insight: Boosting Grokking Through Strategic Data Augmentation

Note:
Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy, including the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.

Introduction
Requirements
- Installation
- Supported Models and API Keys
Setting Up the Templates
Run AI Scientist Paper Generation Experiments
Getting an LLM-Generated Paper Review
Making Your Own Template
- Community-Contributed Templates
Template Resources
Citing The AI Scientist
Frequently Asked Questions
Containerization

Introduction

We provide three templates, which were used in our paper, covering the following domains: NanoGPT, 2D Diffusion, and Grokking. These templates enable The AI Scientist to generate ideas and conduct experiments in these areas. We accept contributions of new templates from the community, but please note that they are not maintained by us. All other templates beyond the three provided are community contributions.

Requirements

This code is designed to run on Linux with NVIDIA GPUs using CUDA and PyTorch. Support for other GPU architectures may be possible by following the PyTorch guidelines. The current templates would likely take an infeasible amount of time on CPU-only machines. Running on other operating systems may require significant adjustments.

Installation

conda create -n ai_scientist python=3.11
conda activate ai_scientist
# Install pdflatex
sudo apt-get install texlive-full

# Install PyPI requirements
pip install -r requirements.txt

Note: Installing texlive-full can take a long time. You may need to hold Enter during the installation.

Supported Models and API Keys

We support a wide variety of models, including open-weight and API-only models. In general, we recommend using only frontier models above the capability of the original GPT-4. To see a full list of supported models, see here.

OpenAI API (GPT-4o, GPT-4o-mini, o1 models)

By default, this uses the OPENAI_API_KEY environment variable.

Anthropic API (Claude Sonnet 3.5)

By default, this uses the ANTHROPIC_API_KEY environment variable.

Claude Models via Bedrock

For Claude models provided by Amazon Bedrock, please install these additional packages:

pip install anthropic[bedrock]

Next, specify a set of valid AWS Credentials and the target AWS Region:

Set the environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME.

Claude Models via Vertex AI

For Claude models provided by Vertex AI Model Garden, please install these additional packages:

pip install google-cloud-aiplatform
pip install anthropic[vertex]

Next, set up valid authentication for a Google Cloud project, for example by providing the region and project ID:

export CLOUD_ML_REGION="REGION"           # for Model Garden call
export ANTHROPIC_VERTEX_PROJECT_ID="PROJECT_ID"  # for Model Garden call
export VERTEXAI_LOCATION="REGION"         # for Aider/LiteLLM call
export VERTEXAI_PROJECT="PROJECT_ID"      # for Aider/LiteLLM call

DeepSeek API (DeepSeek-Coder-V2)

By default, this uses the DEEPSEEK_API_KEY environment variable.

OpenRouter API (Llama3.1)

By default, this uses the OPENROUTER_API_KEY environment variable.

Semantic Scholar API (Literature Search)

Our code can also optionally use a Semantic Scholar API Key (S2_API_KEY) for higher throughput if you have one, though it should work without it in principle. If you have problems with Semantic Scholar, you can skip the literature search and citation phases of paper generation.

Be sure to provide the key for the model used for your runs, e.g.:

export OPENAI_API_KEY="YOUR KEY HERE"
export S2_API_KEY="YOUR KEY HERE"

Setting Up the Templates

This section provides instructions for setting up each of the three templates used in our paper. Before running The AI Scientist experiments, please ensure you have completed the setup steps for the templates you are interested in.

NanoGPT Template

Description: This template investigates transformer-based autoregressive next-token prediction tasks.

Setup Steps:

Prepare the data:

python data/enwik8/prepare.py
python data/shakespeare_char/prepare.py
python data/text8/prepare.py

Create baseline runs (machine dependent):

# Set up NanoGPT baseline run
# NOTE: YOU MUST FIRST RUN THE PREPARE SCRIPTS ABOVE!
cd templates/nanoGPT
python experiment.py --out_dir run_0
python plot.py

2D Diffusion Template

Description: This template studies improving the performance of diffusion generative models on low-dimensional datasets.

Setup Steps:

Install dependencies:

# Set up 2D Diffusion
git clone https://github.com/gregversteeg/NPEET.git
cd NPEET
pip install .
pip install scikit-learn

Create baseline runs:

# Set up 2D Diffusion baseline run
cd templates/2d_diffusion
python experiment.py --out_dir run_0
python plot.py

Grokking Template

Description: This template investigates questions about generalization and learning speed in deep neural networks.

Setup Steps:

Install dependencies:
```
# Set up Grokking
pip install einops
```

Create baseline runs:

# Set up Grokking baseline run
cd templates/grokking
python experiment.py --out_dir run_0
python plot.py

Run AI Scientist Paper Generation Experiments

Note: Please ensure the setup steps above are completed before running these experiments.

conda activate ai_scientist
# Run the paper generation.
python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2
python launch_scientist.py --model "claude-3-5-sonnet-20241022" --experiment nanoGPT_lite --num-ideas 2

If you have more than one GPU, use the --parallel option to parallelize ideas across multiple GPUs.

Getting an LLM-Generated Paper Review

import openai
from ai_scientist.perform_review import load_paper, perform_review

client = openai.OpenAI()
model = "gpt-4o-2024-05-13"

# Load paper from PDF file (raw text)
paper_txt = load_paper("report.pdf")

# Get the review dictionary
review = perform_review(
    paper_txt,
    model,
    client,
    num_reflections=5,
    num_fs_examples=1,
    num_reviews_ensemble=5,
    temperature=0.1,
)

# Inspect review results
review["Overall"]    # Overall score (1-10)
review["Decision"]   # 'Accept' or 'Reject'
review["Weaknesses"] # List of weaknesses (strings)

To run batch analysis:

cd review_iclr_bench
python iclr_analysis.py --num_reviews 500 --batch_size 100 --num_fs_examples 1 --num_reflections 5 --temperature 0.1 --num_reviews_ensemble 5

Making Your Own Template

If there is an area of study you would like The AI Scientist to explore, it is straightforward to create your own templates. In general, follow the structure of the existing templates, which consist of:

experiment.py — This is the main script where the core content is. It takes an argument --out_dir, which specifies where it should create the folder and save the relevant information from the run.
plot.py — This script takes the information from the run folders and creates plots. The code should be clear and easy to edit.
prompt.json — Put information about your template here.
seed_ideas.json — Place example ideas here. You can also try to generate ideas without any examples and then pick the best one or two to put here.
latex/template.tex — We recommend using our LaTeX folder but be sure to replace the pre-loaded citations with ones that you expect to be more relevant.

The key to making new templates work is matching the base filenames and output JSONs to the existing format; everything else is free to change. You should also ensure that the template.tex file is updated to use the correct citation style / base plots for your template.

Community-Contributed Templates

We welcome community contributions in the form of new templates. While these are not maintained by us, we are delighted to highlight your templates to others. Below, we list community-contributed templates along with links to their pull requests (PRs):

Infectious Disease Modeling (seir) - PR #137
Image Classification with MobileNetV3 (mobilenetV3) - PR #141
Sketch RNN (sketch_rnn) - PR #143

This section is reserved for community contributions. Please submit a pull request to add your template to the list! Please describe the template in the PR description, and also show examples of the generated papers.

Template Resources

We provide three templates, which heavily use code from other repositories, credited below:

NanoGPT Template uses code from NanoGPT and this PR.
2D Diffusion Template uses code from tiny-diffusion, ema-pytorch, and Datasaur.
Grokking Template uses code from Sea-Snell/grokking and danielmamay/grokking.

We would like to thank the developers of the open-source models and packages for their contributions and for making their work available.

Citing The AI Scientist

If you use The AI Scientist in your research, please cite it as follows:

@article{lu2024aiscientist,
  title={The {AI} {S}cientist: Towards Fully Automated Open-Ended Scientific Discovery},
  author={Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David},
  journal={arXiv preprint arXiv:2408.06292},
  year={2024}
}

Frequently Asked Questions

We recommend reading our paper first for any questions you have on The AI Scientist.

Why am I missing files when running The AI Scientist?

Ensure you have completed all the setup and preparation steps before the main experiment script.

Why has a PDF or a review not been generated?

The AI Scientist finishes an idea with a success rate that depends on the template, the base foundation model, and the complexity of the idea. We advise referring to our main paper. The highest success rates are observed with Claude Sonnet 3.5. Reviews are best done with GPT-4o; all other models have issues with positivity bias or failure to conform to required outputs.

What is the cost of each idea generated?

Typically less than $15 per paper with Claude Sonnet 3.5. We recommend DeepSeek Coder V2 for a much more cost-effective approach. A good place to look for new models is the Aider leaderboard.

How do I change the base conference format associated with the write-ups?

Change the base template.tex files contained within each template.

How do I run The AI Scientist for different subject fields?

Please refer to the instructions for different templates. In this current iteration, this is restricted to ideas that can be expressed in code. However, lifting this restriction would represent exciting future work! :)

How do I add support for a new foundation model?

You may modify ai_scientist/llm.py to add support for a new foundation model. We do not advise using any model that is significantly weaker than GPT-4 level for The AI Scientist.

Why do I need to run the baseline runs myself?

These appear as run_0 and should be run per machine you execute The AI Scientist on for accurate run-time comparisons due to hardware differences.

What if I have problems accessing the Semantic Scholar API?

We use the Semantic Scholar API to check ideas for novelty and collect citations for the paper write-up. You may be able to skip these phases if you don't have an API key or the API is slow to access.

Containerization

We include a community-contributed Docker image that may assist with your containerization efforts in experimental/Dockerfile.

You can use this image like this:

# Endpoint Script
docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -v `pwd`/templates:/app/AI-Scientist/templates <AI_SCIENTIST_IMAGE> \
  --model gpt-4o-2024-05-13 \
  --experiment 2d_diffusion \
  --num-ideas 2

# Interactive
docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY \
  --entrypoint /bin/bash \
  <AI_SCIENTIST_IMAGE>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The AI Scientist: Towards Fully Automated
Open-Ended Scientific Discovery 🧑‍🔬

Table of Contents

Introduction

Requirements

Installation

Supported Models and API Keys

OpenAI API (GPT-4o, GPT-4o-mini, o1 models)

Anthropic API (Claude Sonnet 3.5)

Claude Models via Bedrock

Claude Models via Vertex AI

DeepSeek API (DeepSeek-Coder-V2)

OpenRouter API (Llama3.1)

Semantic Scholar API (Literature Search)

Setting Up the Templates

NanoGPT Template

2D Diffusion Template

Grokking Template

Run AI Scientist Paper Generation Experiments

Getting an LLM-Generated Paper Review

Making Your Own Template

Community-Contributed Templates

Template Resources

Citing The AI Scientist

Frequently Asked Questions

Containerization

Star History

Files

README.md

Latest commit

History

README.md

File metadata and controls

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Table of Contents

Introduction

Requirements

Installation

Supported Models and API Keys

OpenAI API (GPT-4o, GPT-4o-mini, o1 models)

Anthropic API (Claude Sonnet 3.5)

Claude Models via Bedrock

Claude Models via Vertex AI

DeepSeek API (DeepSeek-Coder-V2)

OpenRouter API (Llama3.1)

Semantic Scholar API (Literature Search)

Setting Up the Templates

NanoGPT Template

2D Diffusion Template

Grokking Template

Run AI Scientist Paper Generation Experiments

Getting an LLM-Generated Paper Review

Making Your Own Template

Community-Contributed Templates

Template Resources

Citing The AI Scientist

Frequently Asked Questions

Containerization

Star History

The AI Scientist: Towards Fully Automated
Open-Ended Scientific Discovery 🧑‍🔬