Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dockerfile #21

Merged
merged 8 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Use Python 3.11 as the base image
FROM python:3.11-bullseye

# Avoid prompts from apt
ENV DEBIAN_FRONTEND=noninteractive

# Set working directory
WORKDIR /app

# Install system dependencies including texlive-full
RUN apt-get update && apt-get install -y --no-install-recommends \
wget=1.21-1+deb11u1 \
git=1:2.30.2-1+deb11u2 \
build-essential=12.9 \
libssl-dev=1.1.1w-0+deb11u1 \
zlib1g-dev=1:1.2.11.dfsg-2+deb11u2 \
libbz2-dev=1.0.8-4 \
libreadline-dev=8.1-1 \
libsqlite3-dev=3.34.1-3 \
libncursesw5-dev=6.2+20201114-2+deb11u2 \
xz-utils=5.2.5-2.1~deb11u1 \
tk-dev=8.6.11+1 \
libxml2-dev=2.9.10+dfsg-6.7+deb11u4 \
libxmlsec1-dev=1.2.31-1 \
libffi-dev=3.3-6 \
liblzma-dev=5.2.5-2.1~deb11u1 \
texlive-full=2020.20210202-3 \
&& rm -rf /var/lib/apt/lists/*

# Upgrade pip
RUN pip install --no-cache-dir --upgrade pip==24.2

# Install Python packages
RUN pip install --no-cache-dir \
anthropic==0.34.0 \
aider-chat==0.50.1 \
backoff==2.2.1 \
openai==1.40.6 \
matplotlib==3.9.2 \
pypdf==4.3.1 \
pymupdf4llm==0.0.10 \
torch==2.4.0 \
numpy==1.26.4 \
transformers==4.44.0 \
datasets==2.21.0 \
tiktoken==0.7.0 \
wandb==0.17.7 \
tqdm==4.66.5 \
scikit-learn==1.5.1 \
einops==0.8.0

# Clone and install NPEET with a specific commit
RUN git clone https://github.com/gregversteeg/NPEET.git
WORKDIR /app/NPEET
RUN git checkout 8b0d9485423f74e5eb199324cf362765596538d3 \
&& pip install .

# Clone the AI-Scientist repository
WORKDIR /app
RUN git clone https://github.com/SakanaAI/AI-Scientist.git

# Set working directory to AI-Scientist
WORKDIR /app/AI-Scientist

# Prepare NanoGPT data
RUN python data/enwik8/prepare.py && \
python data/shakespeare_char/prepare.py && \
python data/text8/prepare.py

# Set up baseline runs
RUN for dir in templates/*/; do \
if [ -f "${dir}experiment.py" ]; then \
cd "${dir}" || continue; \
python experiment.py --out_dir run_0 && \
python plot.py; \
cd /app/AI-Scientist || exit; \
fi \
done

# Create entrypoint script
RUN printf '#!/bin/bash\n\
python launch_scientist.py "$@"\n' > /app/entrypoint.sh && \
chmod +x /app/entrypoint.sh

# Set the entrypoint
ENTRYPOINT ["/app/entrypoint.sh"]

# Set the default command to an empty array
CMD []
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ We further provide all runs and data from our paper [here](https://drive.google.
10. [Grokking Through Compression: Unveiling Sudden Generalization via Minimal Description Length](https://github.com/SakanaAI/AI-Scientist/tree/main/example_papers/mdl_grokking_correlation.pdf)
11. [Accelerating Mathematical Insight: Boosting Grokking Through Strategic Data Augmentation](https://github.com/SakanaAI/AI-Scientist/tree/main/example_papers/data_augmentation_grokking.pdf)

**Note**: Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.
**Note**: Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to [containerize](#containerization) and restrict web access appropriately.

<p align="center">
<a href="https://github.com/SakanaAI/AI-Scientist/blob/main/example_papers/adaptive_dual_scale_denoising/adaptive_dual_scale_denoising.pdf"><img src="https://github.com/SakanaAI/AI-Scientist/blob/main/docs/anim-ai-scientist.gif" alt="Adaptive Dual Scale Denoising" width="80%" />
Expand All @@ -43,6 +43,7 @@ We further provide all runs and data from our paper [here](https://drive.google.
5. [Template Resources](#template-resources)
6. [Citing The AI Scientist](#citing-the-ai-scientist)
7. [Frequently Asked Questions](#faq)
8. [Containerization](#containerization)

## Requirements

Expand Down Expand Up @@ -270,3 +271,24 @@ Please refer to the instructions for different templates. In this current iterat
### How do I add support for a new foundation model?
Please see this [PR](https://github.com/SakanaAI/AI-Scientist/pull/7) for an example of how to add a new model, e.g. this time for Claude via Bedrock.
We do not advise any model that is significantly weaker than GPT-4 level for The AI Scientist.

## Containerization

We include a [community-contributed](https://github.com/SakanaAI/AI-Scientist/pull/21) Docker image that may assist with your containerization efforts in `Dockerfile`.

You can use this image like this:

```bash
# Endpoint Script
docker run -e OPENAI_API_KEY=$OPENAI_API_KEY <AI_SCIENTIST_IMAGE> \
--model “gpt-4o-2024-05-13” \
--experiment 2d_diffusion \
--num-ideas 1
```

```bash
# Interactive
docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY \
--entrypoint /bin/bash \
<AI_SCIENTIST_IMAGE>
```