Skip to content

Commit

Permalink
Document all the things
Browse files Browse the repository at this point in the history
  • Loading branch information
benjaminoakes committed Jan 27, 2023
1 parent 9f7751f commit 3f03f3b
Show file tree
Hide file tree
Showing 4 changed files with 161 additions and 26 deletions.
131 changes: 113 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ The configuration is able to be tuned for the needs of an individual problem. I

Backfills are implemented using ActiveJob classes. SuperSpreader orchestrates running those jobs. Each set of jobs is enqueued by a scheduler using the supplied configuration.

As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `MyBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like:
As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `ExampleBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like:

MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000
MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900
MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800
MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700
MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600
MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500

Notice that there are 3 jobs per second, 2 seconds of work were enqueued, and the batch size is 100. Again, this is just an example for illustration, and the configuration can be modified to suit the needs of the problem.

Expand All @@ -55,14 +55,14 @@ After running out of work, SuperSpreader will enqueue more work:

And the work continues:

MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500
MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400
MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300
MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200
MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100
MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000

This process continues until there is no more work to be done.
This process continues until there is no more work to be done. For more detail, please see [Spreader](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/spreader.rb) and [its spec](https://github.com/doximity/super_spreader/blob/master/spec/spreader_spec.rb).

Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. Does the process need to go faster? Increase the number of jobs per second. Are batches taking too long to complete? Decrease the batch size. Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle. Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises.

Expand All @@ -74,7 +74,96 @@ To repeat an earlier disclaimer:

> **Please be aware:** SuperSpreader is still fairly early in development. While it can be used effecively by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome!
The basic workflow is tested in `spec/integration/backfill_spec.rb`.
If you haven't yet, please read the "How does it work?" section. This basic workflow is tested in `spec/integration/backfill_spec.rb`.

First, write a backfill job. Please see [this example for details](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb).

Next, configure `SuperSpreader` from the console by saving `SchedulerConfig` to Redis. For documentation on each attribute, please see [SchedulerConfig](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/scheduler_config.rb). It is recommended that you start slow, with small batches, short durations, and low per-second rates.

**Important:** SuperSpreader currently only supports a _single_ configuration, though removing that limitation is our Roadmap (please see below).

```ruby
# NOTE: This is an example. You should take your situation into account when
# setting these values.
config = SuperSpreader::SchedulerConfig.new

config.batch_size = 10
config.duration = 10
config.job_class_name = "ExampleBackfillJob"

config.per_second_on_peak = 3.0
config.per_second_off_peak = 3.0

config.on_peak_timezone = "America/Los_Angeles"
config.on_peak_wday_begin = 1
config.on_peak_wday_end = 5
config.on_peak_hour_begin = 5
config.on_peak_hour_end = 17

config.save
```

Now the `SchedulerJob` can be started. It will run until it is stopped or runs out of work.

```ruby
SuperSpreader::SchedulerJob.perform_now
```

At this point, you should monitor your database and worker instances using the tooling you have available. You should make adjustments based on the metrics you have available.

Based on those metrics, slowly step up `per_second_on_peak` and `batch_size` while continuing to monitor:

```ruby
config.batch_size = 20
config.save
```

```ruby
config.per_second_on_peak = 4.0
config.save
```

Continue to step up the rates, until you arrive at a rate that is acceptable for your situation.
For our re-encryption project as an example, our jobs ran at this rate:

```ruby
# NOTE: This is an example. You should take your situation into account when
# setting these values.
config = SuperSpreader::SchedulerConfig.new

config.batch_size = 70
config.duration = 180
config.job_class_name = "ReencryptJob"

config.per_second_on_peak = 3.0
config.per_second_off_peak = 7.5

config.on_peak_timezone = "America/Los_Angeles"
config.on_peak_wday_begin = 1
config.on_peak_wday_end = 5
config.on_peak_hour_begin = 5
config.on_peak_hour_end = 17

config.save
```

### Disaster recovery

If at any point you need to stop the background jobs, stop all scheduling using:

```ruby
SuperSpreader::SchedulerJob.stop!
```

Optionally, if it is acceptable to have a partially-processed cycle, you can stop the backfill jobs as well:

```ruby
ExampleBackfillJob.stop!
```

(Recovering from a partially-processed cycle requires manually setting the correct `initial_id` in `SpreadTracker`.)

The jobs will still be present in the job runner, but will all execute instantly because of the early return as demonstrated in [the example job](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). After the last scheduler job, the process will be paused.

## Installation

Expand Down Expand Up @@ -107,17 +196,23 @@ SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"])

## Roadmap

#### Monitoring
This is a rough outline of some ideas we are considering implementing, based on the content in this README.

#### Add end time estimate

TODO
Add a feature to estimate when the last ID will be processed, which is useful to know when tuning the execution of the scheduler.

#### Allow for multiple concurrent backfills

Currently, SuperSpreader can only backfill using a single scheduler. This means that only one backfill can run at a given time, which requires coordination amongst engineers. The scheduler and configuration needs to be changed to allow for multiple concurrent backfills.

#### Monitoring

This document refers to external tooling for monitoring resource usage. Add instrumentation hooks to allow for internal monitoring.

#### Automated tuning based on backpressure

TODO
After adding internal monitoring, we could automate discovery of optimal `batch_size` and `per_second` values, given recommended tolerances such as 100 ms for backfill jobs and 1500 ms for the scheduler. This would be a significant improvement in DevX.

## Development

Expand Down
29 changes: 26 additions & 3 deletions lib/super_spreader/scheduler_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,42 @@

module SuperSpreader
class SchedulerConfig < RedisModel
# The job class to enqueue on each run of the scheduler.
attribute :job_class_name, :string
# The number of records to process in each invocation of the job class.
attribute :batch_size, :integer
# The amount of work to enqueue, in seconds.
attribute :duration, :integer
attribute :job_class_name, :string

# The number of jobs to enqueue per second, allowing for fractional amounts
# such as 1 job every other second using `0.5`.
attribute :per_second_on_peak, :float
# The same as per_second_on_peak, but for times that are not identified as
# on-peak.
attribute :per_second_off_peak, :float

# UTC crosses the date boundary in an inconvenient way, so allow specifying
# the timezone
# This section manages the definition "on peak." Compare this terminology
# to bus or train schedules.

# The timezone to use for time calculations.
#
# Example: "America/Los_Angeles" for Pacific time
attribute :on_peak_timezone, :string
# The 24-hour hour on which on-peak application usage starts.
#
# Example: 5 for 5 AM
attribute :on_peak_hour_begin, :integer
# The 24-hour hour on which on-peak application usage ends.
#
# Example: 17 for 5 PM
attribute :on_peak_hour_end, :integer
# The wday value on which on-peak application usage starts.
#
# Example: 1 for Monday
attribute :on_peak_wday_begin, :integer
# The wday value on which on-peak application usage ends.
#
# Example: 5 for Friday
attribute :on_peak_wday_end, :integer

attr_writer :schedule
Expand Down
6 changes: 3 additions & 3 deletions spec/scheduler_job_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

expect(log).to eq(<<~LOG)
{"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"}
{"subject":"SuperSpreader::SchedulerJob","batch_size":80,"duration":3600,"job_class_name":"ExampleBackfillJob","per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":80,"duration":3600,"per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","next_id":0}
LOG

Expand All @@ -53,7 +53,7 @@

expect(log).to eq(<<~LOG)
{"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"}
{"subject":"SuperSpreader::SchedulerJob","batch_size":80,"duration":3600,"job_class_name":"ExampleBackfillJob","per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":80,"duration":3600,"per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","next_id":0}
LOG

Expand All @@ -78,7 +78,7 @@

expect(log).to eq(<<~LOG)
{"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"}
{"subject":"SuperSpreader::SchedulerJob","batch_size":1,"duration":1,"job_class_name":"ExampleBackfillJob","per_second_on_peak":1.0,"per_second_off_peak":1.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":1,"duration":1,"per_second_on_peak":1.0,"per_second_off_peak":1.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5}
{"subject":"SuperSpreader::SchedulerJob","next_id":#{next_model.id}}
{"subject":"SuperSpreader::SchedulerJob","next_run_at":"2020-12-16T00:00:01Z"}
LOG
Expand Down
21 changes: 19 additions & 2 deletions spec/support/example_backfill_job.rb
Original file line number Diff line number Diff line change
@@ -1,17 +1,34 @@
# frozen_string_literal: true

# This class is an example job that uses the interface that SuperSpreader
# expects. While this job is for backfilling as an example, any problem
# that can be subdivided into small batches can be implemented.
#
# In Rails, your class should be located within under `app/jobs/` and should
# inherit from `ApplicationJob`.
class ExampleBackfillJob < ActiveJob::Base
# This provides support for stopping the job in an emergency. Optional, but
# highly recommended.
extend SuperSpreader::StopSignal

# This is the model class that will be used when tracking the spread of jobs.
# It is expected to be an ActiveRecord class.
def self.model_class
ExampleModel
end

# Batches are executed using this method and are expected to update all IDs
# in the given range.
def perform(begin_id, end_id)
# This line is what makes it possible to stop all instances of the job
# using `ExampleBackfillJob.stop!`. Optional, but highly recommended.
return if self.class.stopped?

# In a real application, this section would make use of the appropriate,
# efficient database queries.
# In a real application, this section would make use appropriate, efficient
# database queries.
#
# Using SuperSpreader isn't a replacement for efficient SQL. Please
# research options such as https://github.com/zdennis/activerecord-import.
ExampleModel.where(id: begin_id..end_id).each do |example_model|
example_model.update(example_attribute: "example value")
end
Expand Down

0 comments on commit 3f03f3b

Please sign in to comment.