Skip to content
forrestj edited this page Apr 11, 2016 · 7 revisions

At ease, soldier! I assume you're here because this is your first week on Point Guard rotation. Point Guards at Runnable have two primary duties:

  1. Monitor & Curate Produciton Issues
  2. Keep Staging Running Smoothly

If that sounds like something you can't handle, don't fret! This guide is here to help guide you on your journey from Runnable point guard private through first-lieutenant and beyond!

REMEMBER: If there is a piece of information that you don't have you can always ask your neighbor, and if you find a way to improve this document you have the power! By working together we can make sure that being a point-guard is an honor, and not a foo-barred pain in the keester!

First Duty: Production

All of the major Runnable services report errors and performance. By doing so we get a detailed view of production. Thus, the first duty of a point guard is to manage and curate what we are reporting.

In order to perform this duty you will need to do the following:

  1. Get access to Runnable's NewRelic, Rollbar, PagerDuty, and Datadog accounts (ask a fellow engineer if you don't have access)
  2. Read the Rollbar Handbook; this contains all the information you will need to know about how to manage and curate errors in rollbar.
  3. Daily: Check Rollbar Overall Usage - https://rollbar.com/settings/accounts/Runnable-2/usage/ Look for spikes in usage.
  4. Check Rollbar Projects - For each project make sure all error and critical have been addressed (i.e. muted, resolved, or jira'd). If not contact the repo owner with a funny story about what you found. Here is a link to the API project for example https://rollbar.com/Runnable-2/api/items/
  5. Weekly: Check New Relic - https://rpm.newrelic.com/accounts/384085/applications - What is the most common issue? What is the worst issue? Is it worth investing in better fixes for any issues?
  6. Weekly: Check PagerDuty - https://initme.pagerduty.com/incidents - What is the most common issue? What is the worst issue? Is it worth investing in better fixes for any issues?
  7. Get answers from more seasoned engineers when you are unsure of how to proceed.

If you follow the steps above, by the end of your first week on rotation you'll no longer be the rookie. You'll be an stone cold S-class production curator. Happy hunting, soldier.

Second Duty: Staging

The runnable staging environment is runnable running on runnable. Because our software is meant to be a development/staging environment for engineering teams, we (with a few caveats) can run our own software inside our own software.

*(take a moment to have your mind totally blown away by this fact...)

Ready to continue? Good. Now, I know what you're thinking soldier: "Just because we can run our infrastructure inside our own infrastructure, it doesn't neccessarily mean we should," right? WRONG, and here's why:

Dog-fooding: The runnable staging environment sandbox allows us to stay ever connected to our product and ensure it is meeting the needs of a real development team (namely ours). In other words: staging is a way for us to eat our own dog food.

As such, it is important that the staging environment always be available for use during testing and development. This leads us to the second duty of the point guard, which is three fold:

  • Monitor the staging environment and ensure it is operating smoothly
  • Report any functional issues immediately
  • Rotate and clean-up external components (such as staging docks)

In order to perform this duty you'll need to do the following:

  1. Read the Staging Handbook to learn all the ins, outs, cross-connects, and "caveats" that make it possible for us to run our staging environment.
  2. Stay vigilant (like a vampire hunter)

Keep the staging environment up, soldier! If you do so you can make sure your brothers-in-arms have a easy way to test and validate our product. If you don't, we'll see you at your court marshall.

Clone this wiki locally