🔥 Rollup time-series data in Rails
Works great with Ahoy and Searchjoy
Add this line to your application’s Gemfile:
gem "rollups"
And run:
bundle install
rails generate rollups
rails db:migrate
Store the number of users created by day in the rollups
table
User.rollup("New users")
Get the series
Rollup.series("New users")
# {
# Sat, 24 May 2020 => 50,
# Sun, 25 May 2020 => 100,
# Mon, 26 May 2020 => 34
# }
Use a rake task or background job to create rollups on a regular basis. Don’t worry too much about naming - you can rename later if needed.
Specify the time column - created_at
by default
User.rollup("New users", column: :joined_at)
Change the default column for a model
class User < ApplicationRecord
self.rollup_column = :joined_at
end
Specify the interval - day
by default
User.rollup("New users", interval: "week")
And when querying
Rollup.series("New users", interval: "week")
Supported intervals are:
- hour
- day
- week
- month
- quarter
- year
Or any number of minutes or seconds:
- 1m, 5m, 15m
- 1s, 30s, 90s
Weeks start on Sunday by default. Change this with:
Rollup.week_start = :monday
The default time zone is Time.zone
. Change this with:
Rollup.time_zone = "Pacific Time (US & Canada)"
or
User.rollup("New users", time_zone: "Pacific Time (US & Canada)")
Time zone objects also work. To see a list of available time zones in Rails, run rake time:zones:all
.
See date storage for how dates are stored.
Rollups use count
by default. For other calculations, use:
Order.rollup("Revenue") { |r| r.sum(:revenue) }
Works with count
, sum
, minimum
, maximum
, and average
. For median
and percentile
, check out ActiveMedian.
PostgreSQL only
Create rollups with dimensions
Order.group(:platform).rollup("Orders by platform")
Works with multiple groups as well
Order.group(:platform, :channel).rollup("Orders by platform and channel")
Dimension names are determined by the group
clause. To set manually, use:
Order.group(:channel).rollup("Orders by source", dimension_names: ["source"])
See how to query dimensions.
When you run a rollup for the first time, the entire series is calculated. When you run it again, newer data is added.
By default, the latest interval stored for a series is recalculated, since it was likely calculated before the interval completed. Earlier intervals aren’t recalculated since the source rows may have been deleted (this also improves performance).
To recalculate the last few intervals, use:
User.rollup("New users", last: 3)
To recalculate a time range, use:
User.rollup("New users", range: 1.week.ago.all_week)
To only store data for completed intervals, use:
User.rollup("New users", current: false)
To clear and recalculate the entire series, use:
User.rollup("New users", clear: true)
To delete a series, use:
Rollup.where(name: "New users", interval: "day").delete_all
Get a series
Rollup.series("New users")
Specify the interval if it’s not day
Rollup.series("New users", interval: "week")
If a series has dimensions, they must match exactly as well
Rollup.series("Orders by platform and channel", dimensions: {platform: "Web", channel: "Search"})
Get a specific time range
Rollup.where(time: Date.current.all_year).series("New Users")
PostgreSQL only
Get multiple series grouped by dimensions
Rollup.multi_series("Orders by platform")
Specify the interval if it’s not day
Rollup.multi_series("Orders by platform", interval: "week")
Filter by dimensions
Rollup.where_dimensions(platform: "Web").multi_series("Orders by platform and channel")
Get a specific time range
Rollup.where(time: Date.current.all_year).multi_series("Orders by platform")
Uses the Rollup
model to query the data directly
Rollup.where(name: "New users", interval: "day")
List names and intervals
Rollup.list
Rollup works great with Chartkick
<%= line_chart Rollup.series("New users") %>
For multiple series, set a name
for each series before charting
series = Rollup.multi_series("Orders by platform")
series.each do |s|
s[:name] = s[:dimensions]["platform"]
end
Use any naming convention you prefer. Some ideas are:
- Human -
New users
- Underscore -
new_users
- Dots -
new_users.count
Rename with:
Rollup.rename("Old name", "New name")
Rollup stores both dates and times in the time
column depending on the interval. For date intervals (day, week, etc), it stores 00:00:00
for the time part. Cast the time
column to a date when querying in SQL to get the correct value.
- PostgreSQL:
time::date
- MySQL:
CAST(time AS date)
- SQLite:
date(time)
Set the default rollup column for your models
class Ahoy::Visit < ApplicationRecord
self.rollup_column = :started_at
end
and
class Ahoy::Event < ApplicationRecord
self.rollup_column = :time
end
Hourly visits
Ahoy::Visit.rollup("Visits", interval: "hour")
Visits by browser
Ahoy::Visit.group(:browser).rollup("Visits by browser")
Unique homepage views
Ahoy::Event.where(name: "Viewed homepage").joins(:visit).rollup("Homepage views") { |r| r.distinct.count(:visitor_token) }
Product views
Ahoy::Event.where(name: "Viewed product").group_prop(:product_id).rollup("Product views")
Daily searches
Searchjoy::Search.rollup("Searches")
Searches by query
Searchjoy::Search.group(:normalized_query).rollup("Searches by query", dimension_names: ["query"])
Conversion rate
Searchjoy::Search.rollup("Search conversion rate") { |r| r.average("(converted_at IS NOT NULL)::int") }
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/rollup.git
cd rollup
bundle install
# create databases
createdb rollup_test
mysqladmin create rollup_test
# run tests
bundle exec rake test