This repository generates question & answer pairs (QAP), which are retrieved and assessed by Dojo Subnet.
In terms of implementation, an asynchronous task scheduler is used to schedule QAP generation. Redis is used to store generated pairs. Dojo retrieve QAPs by querying the /synthetic-gen
endpoint hosted by this repo. Successful retrieval will delete that QAP from redis. When the number of stored pairs falls under a given threshold, the task scheduler will generate new QAPs to replenish the database, ensuring that there is always a healthy supply of QAPs for use by Dojo.
QAPs are generated by prompting and querying LLMs. This process is underimprovement and is subject to change in the future.
- Query an LLM for a list of common objects used in animation.
- Prompt an LLM with these objects to create a coding question.
- Prompt an LLM to answer the generated question. The generated question and output code make up a QAP.
Currently, synthetic-api only generates code output as Javascript. We are in the process of expanding support for other programming languages.
Before running the synthetic-api, you will need the following keys:
- Openrouter (from https://openrouter.ai/)
Copy the .env.example file to a .env file and fill in the blanks, here we will use Openrouter as our LLM API provider.
Docker will create a redis instance using the specified REDIS_USERNAME
and REDIS_PASSWORD
. You will need these to manually interact with your redis on docker.
cp .env.example .env
# env vars that need to be filled
REDIS_USERNAME=
REDIS_PASSWORD=
OPENROUTER_API_KEY=
Install docker and docker-compose:
sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
sudo mkdir -m 0755 -p /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Run docker to launch synthetic-api:
# Run the service
docker compose up -d
this is meant to serve as a way for the LLM to get some sort of feedback to ensure our QA pair outputs are good enough when it reaches the end-user (i.e. miners/human labellers), and feeds the live runtime errors etc. to the LLM rather than just have the LLM "think" about it.
the following is how get_feedback
is supposed to work, which is the main function to call to get feedback on any piece of code.
- firstly ensures that a docker image that runs a nodejs server is built
- creates a temporary folder inside of
commons/code_executor/sandbox-workspace/
, calledrun_<run_uuid>
- injects error logging javascript code as inline javascript into a HTML file, and writes it into
index.html
to be served by the nodejs server - searches for a free port <port_no> on the host machine in the range 3000-3999, as a port may be taken due to another asyncio coroutine running
- runs a docker container for the nodejs server with an error logging endpoint, and serves an
index.html
on the port <port_no> - uses headless puppeteer to visit the page at
http://localhost:<port_no>
to trigger rendering of the page - on errors like SyntaxError, TypeError, etc. these get written into the app.log when the client-side calls the error logging endpoint on the server-side
- reads the contents of
app.log
(in the same folder as index.html) and returns it to the caller
- Syntax Error
- Type Error
- Reference Error
- errors that occur during interaction of different components
Based on the ReWOO paper (https://arxiv.org/abs/2305.18323), we use this to be able to fix any buggy code that the LLM outputs.
- given a task (which in our case is always code generation), generate a plan containing different steps, tool calls, and inputs (dependent steps) & outputs of each step.
- asynchronously try to execute all steps, where the current step being executed will wait for a dependency to be resolved (with some timeout)
- for each step, execute tools (in
commons/code_iterator/tools.py
)- web search - using html version of duckduckgo as it's free
- use llm - call another LLM based on a query
- fix code - a single turn LLM call to ask an LLM to fix code, given the feedback from our code executor
- once all steps are resolved, ask a solver LLM if the task has been completed