Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Time-Series Counter Storage #895

Closed
dcramer opened this issue May 15, 2013 · 3 comments
Closed

New Time-Series Counter Storage #895

dcramer opened this issue May 15, 2013 · 3 comments

Comments

@dcramer
Copy link
Member

dcramer commented May 15, 2013

This ticket is to document an implementation for a new counter backend which will hold more data at varying granularities.

The one requirement, of course, is that it runs in SQL.

In all solutions, we would define intervals for rollups (1s, 60s, 60m, 24h). We'd additionally be using the buffers to batch write for efficiency.

Solution A: Periodic Rollups

  • Increment a counter for the lowest interval
  • Periodically (every 1m?) using Celery rollup counters from last job execution
  • Trim out of bounds items

Pros

  • Simple writes that are quick

Cons

  • Requires the queue to function

Solution B: Trimming

  • Increment a counter for all intervals at once
  • Periodically trim counters which are no longer used at various intervals

Pros

  • Doesnt actually require the queue to function

Cons

  • Writes can be a lot heavier if we store many intervals
@dcramer
Copy link
Member Author

dcramer commented May 15, 2013

FWIW B is the easiest/winner so far

@nvie
Copy link

nvie commented May 15, 2013

FWIW, I use a solution like B for our internal metrics counter, with the difference I'm using Redis to collect the metrics. Basically, the trick I use when recording a "tick" is to increment a counter (or set a bit in case of "just" measuring activity/inactivity) for each interval. Since we're using Redis to collect this data, we can leverage its EXPIREAT command to have the recorded data auto-expire after a while. If you chose these expiry times wisely, this provides a write-once, never-think-about-cleaning-again technique for recording your metrics.

Every "tick", thus, leads to the following Redis command (pipelined). Here I'm recording activity for user 878, in 5m, 1h and 1d intervals. The 5m-interval expires after 2 days, the 1h-interval after 30 days, and the 1d-interval never expires:

MULTI
SETBIT "metrics:active_users:5m:1355511000" "878" "1"
EXPIREAT "metrics:active_users:5m:1355511000" "1355683800"
SETBIT "metrics:active_users:1h:1355508000" "878" "1"
EXPIREAT "metrics:active_users:1h:1355508000" "1358100000"
SETBIT "metrics:active_users:1d:1355443200" "878" "1"
EXEC

(Yes, this keeps setting the EXPIREAT for every tick, but it's much less overhead compared to testing whether that should be set. And yes, choosing not to expire the 1d intervals will require a manual cleanup at some point.)

@dcramer
Copy link
Member Author

dcramer commented May 18, 2013

I've started work on this in the tsdb branch.

c04d0d0

https://github.com/getsentry/sentry/tree/tsdb

@dcramer dcramer closed this as completed Jan 5, 2015
@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants