Sandbox Capabilities Framework #1251

rbren · 2024-04-20T13:42:20Z

Summary
We have an existing use case for a Jupyter-aware agent, which always runs in a sandbox where Jupyter is available. There are some other scenarios I can think of where an agent might want some guarantees about what it can do with the sandbox:

We might want a "postgres migration writer", which needs access to a postgres instance
We might have a "cypress test creator" agent, which would need access to cypress
Further down the road, we might want to have an Open Interpreter agent, which needs access to osascript
etc etc

This proposal would allow agents to guarantee that certain programs are available in the sandbox, or that certain services are running in a predictable way.

What if we did something like this:

Motivation
We want agents to be able to have certain guarantees about the sandbox environment. But we also want our sandbox interface to be generic--something like "you have a bash terminal".

The latter is especially important, because we want users to be able to bring their own sandbox images. E.g. you might use an off-the-shelf haskell image if your project uses haskell--otherwise you'd need to go through the install process every time you start OpenDevin, or maintain a fork of the sandbox.

Technical Design

For every requirement we support (e.g. jupyter, postgres, cypress), we have a bash script that
- checks if it's installed
- if not, installs it
- maybe starts something in the background
Let agents specify a list of requirements
- e.g. CodeActAgent could say requirements: ['jupyter']
When we start the Agent+Sandbox pair, we run the necessary bash scripts
- should be pretty quick if the requirement is already built into the image
Then the agent has some guarantees about the requirement being met, and how it's running
- e.g. we can put in the prompt "there's a postgres server running on port 5432, user foo, password bar"
If there are specific ways of interacting with that env (e.g. for jupyter, it seems we have to write to a websocket that's open in the sandbox?) the agent can implement custom Actions, like run_in_jupyter

Alternatives to Consider

Building a bunch of stuff into one big sandbox
Building special sandboxes that are required by certain agents (e.g. a JupyterSandbox)

Additional context
https://opendevin.slack.com/archives/C06QKSD9UBA/p1713552591042089

rbren · 2024-04-20T13:42:45Z

@mlejva curious to get your thoughts on this one!

Mike-FreeAI · 2024-04-20T14:11:41Z

@PierrunoYT here

PierrunoYT · 2024-04-20T14:12:29Z

Devin has reviewed the Sandbox Capabilities Framework as outlined in issue #1251 and finds the proposal to be comprehensive and well-thought-out. It addresses the key concerns and provides a solid foundation for future development and integration.

Mike-FreeAI · 2024-04-20T14:14:23Z

Devin has reviewed the Sandbox Capabilities Framework as outlined in issue #1251 and finds the proposal to be comprehensive and well-thought-out. It addresses the key concerns and provides a solid foundation for future development and integration.

come to my github repos

rbren · 2024-04-24T22:14:51Z

This is in!

rbren added the enhancement New feature or request label Apr 20, 2024

xingyaoww mentioned this issue Apr 20, 2024

feat(sandbox): Candidate Implementation of Sandbox Plugin to Support Jupyter #1255

Merged

rbren closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sandbox Capabilities Framework #1251

Sandbox Capabilities Framework #1251

rbren commented Apr 20, 2024

rbren commented Apr 20, 2024

Mike-FreeAI commented Apr 20, 2024

PierrunoYT commented Apr 20, 2024

Mike-FreeAI commented Apr 20, 2024

rbren commented Apr 24, 2024

Sandbox Capabilities Framework #1251

Sandbox Capabilities Framework #1251

Comments

rbren commented Apr 20, 2024

rbren commented Apr 20, 2024

Mike-FreeAI commented Apr 20, 2024

PierrunoYT commented Apr 20, 2024

Mike-FreeAI commented Apr 20, 2024

rbren commented Apr 24, 2024