dojo-synthetic-api

This repository generates question & answer pairs (QAP), which are retrieved and assessed by Dojo Subnet.

In terms of implementation, an asynchronous task scheduler is used to schedule QAP generation. Redis is used to store generated pairs. Dojo retrieve QAPs by querying the /synthetic-gen endpoint hosted by this repo. Successful retrieval will delete that QAP from redis. When the number of stored pairs falls under a given threshold, the task scheduler will generate new QAPs to replenish the database, ensuring that there is always a healthy supply of QAPs for use by Dojo.

QAPs are generated by prompting and querying LLMs. This process is underimprovement and is subject to change in the future.

Query an LLM for a list of common objects used in animation.
Prompt an LLM with these objects to create a coding question.
Prompt an LLM to answer the generated question. The generated question and output code make up a QAP.

Currently, synthetic-api only generates code output as Javascript. We are in the process of expanding support for other programming languages.

Setup

Before running the synthetic-api, you will need the following keys:

Openrouter (from https://openrouter.ai/)

Copy the .env.example file to a .env file and fill in the blanks, here we will use Openrouter as our LLM API provider.

Docker will create a redis instance using the specified REDIS_USERNAME and REDIS_PASSWORD. You will need these to manually interact with your redis on docker.

cp .env.example .env

# env vars that need to be filled
REDIS_USERNAME=
REDIS_PASSWORD=
OPENROUTER_API_KEY=

Run with docker-compose

Install docker and docker-compose:

sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
sudo mkdir -m 0755 -p /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Run docker to launch synthetic-api:

# Run the service
docker compose up -d

code executor

rationale

this is meant to serve as a way for the LLM to get some sort of feedback to ensure our QA pair outputs are good enough when it reaches the end-user (i.e. miners/human labellers), and feeds the live runtime errors etc. to the LLM rather than just have the LLM "think" about it.

how it works

the following is how get_feedback is supposed to work, which is the main function to call to get feedback on any piece of code.

firstly ensures that a docker image that runs a nodejs server is built
creates a temporary folder inside of commons/code_executor/sandbox-workspace/, called run_<run_uuid>
injects error logging javascript code as inline javascript into a HTML file, and writes it into index.html to be served by the nodejs server
searches for a free port <port_no> on the host machine in the range 3000-3999, as a port may be taken due to another asyncio coroutine running
runs a docker container for the nodejs server with an error logging endpoint, and serves an index.html on the port <port_no>
uses headless puppeteer to visit the page at http://localhost:<port_no> to trigger rendering of the page
on errors like SyntaxError, TypeError, etc. these get written into the app.log when the client-side calls the error logging endpoint on the server-side
reads the contents of app.log (in the same folder as index.html) and returns it to the caller

types of errors caught so far

Syntax Error
Type Error
Reference Error

todo

errors that occur during interaction of different components

code iterator

Based on the ReWOO paper (https://arxiv.org/abs/2305.18323), we use this to be able to fix any buggy code that the LLM outputs.

how it works

given a task (which in our case is always code generation), generate a plan containing different steps, tool calls, and inputs (dependent steps) & outputs of each step.
asynchronously try to execute all steps, where the current step being executed will wait for a dependency to be resolved (with some timeout)
for each step, execute tools (in commons/code_iterator/tools.py)
1. web search - using html version of duckduckgo as it's free
2. use llm - call another LLM based on a query
3. fix code - a single turn LLM call to ask an LLM to fix code, given the feedback from our code executor
once all steps are resolved, ask a solver LLM if the task has been completed

Name		Name	Last commit message	Last commit date
Latest commit History 332 Commits
.githooks		.githooks
.github/workflows		.github/workflows
commons		commons
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dojo-synthetic-api

Setup

Run with docker-compose

code executor

rationale

how it works

types of errors caught so far

todo

code iterator

how it works

About

Releases

Packages

Contributors 8

Languages

tensorplex-labs/dojo-synthetic-api

Folders and files

Latest commit

History

Repository files navigation

dojo-synthetic-api

Setup

Run with docker-compose

code executor

rationale

how it works

types of errors caught so far

todo

code iterator

how it works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages