Productizing bot deployment #1541

dpaiton · 2024-06-18T20:31:52Z

Tasks

responsibility

Core bot & infra team: @ryangoree @slundqui @jalextowle @mcclurejt @jrhea
Secondary team: @sentilesdal @dpaiton @wakamex

All people listed should

know how to (& have credentials to) restart and/or deploy bots
monitor bot-related rollbar notifications; check that any critical bugs are being addressed
understand error prioritization and know the failure playbook

importance (priority)

invariant fails (page @jalextowle @jrhea @mcclurejt )
checkpoint bot tries to checkpoint & fails
checkpoint bot goes down
invariant goes down

top priorities for mainnet

checkpoint bot & invariance check bot
- runs
- reporting system for when it goes down
- secure credential management
- documentation on how to (re)deploy bots

bots to consider

checkpoint
invariance check
lpandarb
- this should be added after the other two are working well

documentation

README.md in infra

uptime monitoring

easily-accessible location for cloud machine address & status
easily-accessible portal to view all deployed bot wallets

error reporting & notifications

notifications to critical team when bots go down (rollbar?)
system in place to assign responsibility for who should handle errors

easy start & restart

minimal steps to deploy new bots on a pool
ideally would be able to run out in a mainnet fork on aws instance

containerized deployment

setup flag for "service bots"

invariant checks

rollbar filters for each check type

credentials storage

privileged access to private keys for bots
whoever sets this up is fine with making calls -- lets prioritize "easy" and "safe"
- ideally use a free service, but if not then fine
easiest to use env vars
lastpass credentials for pauser

continuous deployment

nice to have
when infra pushes a release we deploy bots on a mainnet fork in AWS?
almost-continuous deployment -- make it easy for a dev to manually test deployment

current status -- checkpoint bot:

running in docker container
- docker can restart automatically on failure (easily set up)
passes credentials via env variables set in infra repo
- registry address, rpc uri (points to anvil node), private key, rollbar api key

slundqui · 2024-06-18T20:59:58Z

Readme on deploying bots within delvtech/hyperdrive-infra#119

slundqui · 2024-06-18T22:00:15Z

Something to note is that rollbar doesn't have a great way to log "this process is dead". May need a separate "monitoring" container that logs errors if the service bots containers are stopped, or we allow docker to always restart. Even then, if the aws machine goes down, there's no way of logging an "this is down" error

wakamex · 2024-06-20T21:13:12Z

I changed the second-last bullet from document machine details (ip, port) and make sure everyone has ssh access to make sure everyone has access. Originally we envisioned using AWS, but @mcclurejt convinced me fly.io is way easier. We won't need individual ssh keys. But I'll still go through and make sure everyone has access, so I tagged myself to the bullet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Productizing bot deployment #1541

Productizing bot deployment #1541

dpaiton commented Jun 18, 2024 •

edited by slundqui

Loading

slundqui commented Jun 18, 2024

slundqui commented Jun 18, 2024 •

edited

Loading

wakamex commented Jun 20, 2024

Productizing bot deployment #1541

Productizing bot deployment #1541

Comments

dpaiton commented Jun 18, 2024 • edited by slundqui Loading

Tasks

responsibility

All people listed should

importance (priority)

top priorities for mainnet

bots to consider

documentation

uptime monitoring

error reporting & notifications

easy start & restart

containerized deployment

invariant checks

credentials storage

continuous deployment

current status -- checkpoint bot:

slundqui commented Jun 18, 2024

slundqui commented Jun 18, 2024 • edited Loading

wakamex commented Jun 20, 2024

dpaiton commented Jun 18, 2024 •

edited by slundqui

Loading

slundqui commented Jun 18, 2024 •

edited

Loading