From 9bbbd9639a5ae154d4e3e8aae2224ef1023a6b89 Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 25 Jan 2023 13:22:06 -0600 Subject: [PATCH 1/7] Barebones usage docs --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1b0ca05..518f6df 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,11 @@ The primary criterion to consider is whether the backfill in question is _long-r ## How do I use it? -TODO +To repeat an earlier disclaimer: + +> **Please be aware:** SuperSpreader is still fairly early in development. While it can be used effecively by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome! + +The basic workflow is tested in `spec/integration/backfill_spec.rb`. ## Roadmap From 2fa40958c68c360d72a949b2b37045d40ed00f1e Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 25 Jan 2023 13:39:09 -0600 Subject: [PATCH 2/7] Document setup --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 518f6df..ef800e6 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,17 @@ Or install it yourself as: $ gem install super_spreader +SuperSpreader requires an ActiveRecord-compatible database, an ActiveJob-compatible job runner, and Redis for bookkeeping. + +For Rails, please set up SuperSpreader using an initializer: + +```ruby +# config/initializers/super_spreader.rb + +SuperSpreader.logger = Rails.logger +SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"]) +``` + ## Usage TODO: Write usage instructions here From 6d38b52a5f7680f03e7208447d47a7044c8c8e8c Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 25 Jan 2023 14:40:34 -0600 Subject: [PATCH 3/7] Add "How does it work?" --- README.md | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ef800e6..bcf4c9d 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Please also see "Roadmap" for other known limitations that may be relevant to yo ## History -SuperSpreader was originally written to re-encrypt the Dialer database, a key component of Doximity's telehealth offerings. Without SuperSpreader, it would have taken several months to handle many millions of records using a Key Management Service (KMS) that adds an overhead of 11 ms per record. Using SuperSpreader took the time to backfill down to a couple of weeks. This massive backfill happened safely during very high Dialer usage during the winter of 2020. Of course, the name came from the coronavirus pandemic, which had a number of super-spreader events in the news around the same time. Rather than spreading disease, the SuperSpreader gem spreads out telehealth background jobs to support the healthcare professionals that fight disease. +SuperSpreader was originally written to re-encrypt the Dialer database, a key component of Doximity's telehealth offerings. Without SuperSpreader, it would have taken **several months** to handle many millions of records using a Key Management Service (KMS) that adds an overhead of 11 ms per record. Using SuperSpreader took the time to backfill down to a couple of weeks. This massive backfill happened safely during very high Dialer usage during the winter of 2020. Of course, the name came from the coronavirus pandemic, which had a number of super-spreader events in the news around the same time. Rather than spreading disease, the SuperSpreader gem spreads out telehealth background jobs to support the healthcare professionals that fight disease. Since that time, our team has started to use SuperSpreader in many other situations. Our hope is that other teams, internal and external, can use it if they have similar problems to solve. @@ -32,6 +32,42 @@ The primary criterion to consider is whether the backfill in question is _long-r ## How does it work? +SuperSpreader enqueues a configurable number of background jobs on a set schedule. These background jobs are executed in small batches such that only a small number of jobs are enqueued at any given time. The jobs start at the most recent record and work back to the first record, based on the auto-incrementing primary key. + +The configuration is able to be tuned for the needs of an individual problem. If the backfill would require months of compute time, it can be run in parallel so that it takes much less time. The resource utilization can be spread out so that shared resources, such as a database, are not overwhelmed with requests. Finally, there is also support for running more jobs during off-peak usage based on a schedule. + +Backfills are implemented using ActiveJob classes. SuperSpreader orchestrates running those jobs. Each set of jobs is enqueued by a scheduler using the supplied configuration. + +As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `MyBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like: + + MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000 + MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900 + MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800 + MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700 + MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600 + MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500 + +Notice that there are 3 jobs per second, 2 seconds of work were enqueued, and the batch size is 100. Again, this is just an example for illustration, and the configuration can be modified to suit the needs of the problem. + +After running out of work, SuperSpreader will enqueue more work: + + SuperScheduler::SchedulerJob run_at: "2020-11-16T22:52:01Z" + +And the work continues: + + MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500 + MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400 + MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300 + MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200 + MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100 + MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000 + +This process continues until there is no more work to be done. + +Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. Does the process need to go faster? Increase the number of jobs per second. Are batches taking too long to complete? Decrease the batch size. Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle. Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises. + +As it stands, each run of SuperSpreader is hand-tuned. It is highly recommended that SuperSpreader resource utilization is monitored during runs. That said, it is designed to run autonomously once a good configuration is found. + ## How do I use it? To repeat an earlier disclaimer: From 9f7751f22d04828b89e8a40c5f7fc7d8aa988bb4 Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 25 Jan 2023 14:42:42 -0600 Subject: [PATCH 4/7] Move up Installation --- README.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index bcf4c9d..da453ce 100644 --- a/README.md +++ b/README.md @@ -76,22 +76,10 @@ To repeat an earlier disclaimer: The basic workflow is tested in `spec/integration/backfill_spec.rb`. -## Roadmap - -#### Monitoring - -TODO - -#### Allow for multiple concurrent backfills - -Currently, SuperSpreader can only backfill using a single scheduler. This means that only one backfill can run at a given time, which requires coordination amongst engineers. The scheduler and configuration needs to be changed to allow for multiple concurrent backfills. - -#### Automated tuning based on backpressure - -TODO - ## Installation +If you've gotten this far and think SuperSpreader is a good fit for your problem, these are the instructions for installing it. + Add this line to your application's Gemfile: ```ruby @@ -117,9 +105,19 @@ SuperSpreader.logger = Rails.logger SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"]) ``` -## Usage +## Roadmap + +#### Monitoring -TODO: Write usage instructions here +TODO + +#### Allow for multiple concurrent backfills + +Currently, SuperSpreader can only backfill using a single scheduler. This means that only one backfill can run at a given time, which requires coordination amongst engineers. The scheduler and configuration needs to be changed to allow for multiple concurrent backfills. + +#### Automated tuning based on backpressure + +TODO ## Development From 3f03f3b3c961210076b523fe86932971af8fe46b Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 25 Jan 2023 18:44:55 -0600 Subject: [PATCH 5/7] Document all the things --- README.md | 131 +++++++++++++++++++++---- lib/super_spreader/scheduler_config.rb | 29 +++++- spec/scheduler_job_spec.rb | 6 +- spec/support/example_backfill_job.rb | 21 +++- 4 files changed, 161 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index da453ce..9833dae 100644 --- a/README.md +++ b/README.md @@ -38,14 +38,14 @@ The configuration is able to be tuned for the needs of an individual problem. I Backfills are implemented using ActiveJob classes. SuperSpreader orchestrates running those jobs. Each set of jobs is enqueued by a scheduler using the supplied configuration. -As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `MyBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like: +As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `ExampleBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like: - MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000 - MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900 - MyBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800 - MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700 - MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600 - MyBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500 + ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000 + ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900 + ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800 + ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700 + ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600 + ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500 Notice that there are 3 jobs per second, 2 seconds of work were enqueued, and the batch size is 100. Again, this is just an example for illustration, and the configuration can be modified to suit the needs of the problem. @@ -55,14 +55,14 @@ After running out of work, SuperSpreader will enqueue more work: And the work continues: - MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500 - MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400 - MyBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300 - MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200 - MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100 - MyBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000 + ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500 + ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400 + ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300 + ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200 + ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100 + ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000 -This process continues until there is no more work to be done. +This process continues until there is no more work to be done. For more detail, please see [Spreader](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/spreader.rb) and [its spec](https://github.com/doximity/super_spreader/blob/master/spec/spreader_spec.rb). Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. Does the process need to go faster? Increase the number of jobs per second. Are batches taking too long to complete? Decrease the batch size. Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle. Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises. @@ -74,7 +74,96 @@ To repeat an earlier disclaimer: > **Please be aware:** SuperSpreader is still fairly early in development. While it can be used effecively by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome! -The basic workflow is tested in `spec/integration/backfill_spec.rb`. +If you haven't yet, please read the "How does it work?" section. This basic workflow is tested in `spec/integration/backfill_spec.rb`. + +First, write a backfill job. Please see [this example for details](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). + +Next, configure `SuperSpreader` from the console by saving `SchedulerConfig` to Redis. For documentation on each attribute, please see [SchedulerConfig](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/scheduler_config.rb). It is recommended that you start slow, with small batches, short durations, and low per-second rates. + +**Important:** SuperSpreader currently only supports a _single_ configuration, though removing that limitation is our Roadmap (please see below). + +```ruby +# NOTE: This is an example. You should take your situation into account when +# setting these values. +config = SuperSpreader::SchedulerConfig.new + +config.batch_size = 10 +config.duration = 10 +config.job_class_name = "ExampleBackfillJob" + +config.per_second_on_peak = 3.0 +config.per_second_off_peak = 3.0 + +config.on_peak_timezone = "America/Los_Angeles" +config.on_peak_wday_begin = 1 +config.on_peak_wday_end = 5 +config.on_peak_hour_begin = 5 +config.on_peak_hour_end = 17 + +config.save +``` + +Now the `SchedulerJob` can be started. It will run until it is stopped or runs out of work. + +```ruby +SuperSpreader::SchedulerJob.perform_now +``` + +At this point, you should monitor your database and worker instances using the tooling you have available. You should make adjustments based on the metrics you have available. + +Based on those metrics, slowly step up `per_second_on_peak` and `batch_size` while continuing to monitor: + +```ruby +config.batch_size = 20 +config.save +``` + +```ruby +config.per_second_on_peak = 4.0 +config.save +``` + +Continue to step up the rates, until you arrive at a rate that is acceptable for your situation. +For our re-encryption project as an example, our jobs ran at this rate: + +```ruby +# NOTE: This is an example. You should take your situation into account when +# setting these values. +config = SuperSpreader::SchedulerConfig.new + +config.batch_size = 70 +config.duration = 180 +config.job_class_name = "ReencryptJob" + +config.per_second_on_peak = 3.0 +config.per_second_off_peak = 7.5 + +config.on_peak_timezone = "America/Los_Angeles" +config.on_peak_wday_begin = 1 +config.on_peak_wday_end = 5 +config.on_peak_hour_begin = 5 +config.on_peak_hour_end = 17 + +config.save +``` + +### Disaster recovery + +If at any point you need to stop the background jobs, stop all scheduling using: + +```ruby +SuperSpreader::SchedulerJob.stop! +``` + +Optionally, if it is acceptable to have a partially-processed cycle, you can stop the backfill jobs as well: + +```ruby +ExampleBackfillJob.stop! +``` + +(Recovering from a partially-processed cycle requires manually setting the correct `initial_id` in `SpreadTracker`.) + +The jobs will still be present in the job runner, but will all execute instantly because of the early return as demonstrated in [the example job](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). After the last scheduler job, the process will be paused. ## Installation @@ -107,17 +196,23 @@ SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"]) ## Roadmap -#### Monitoring +This is a rough outline of some ideas we are considering implementing, based on the content in this README. + +#### Add end time estimate -TODO +Add a feature to estimate when the last ID will be processed, which is useful to know when tuning the execution of the scheduler. #### Allow for multiple concurrent backfills Currently, SuperSpreader can only backfill using a single scheduler. This means that only one backfill can run at a given time, which requires coordination amongst engineers. The scheduler and configuration needs to be changed to allow for multiple concurrent backfills. +#### Monitoring + +This document refers to external tooling for monitoring resource usage. Add instrumentation hooks to allow for internal monitoring. + #### Automated tuning based on backpressure -TODO +After adding internal monitoring, we could automate discovery of optimal `batch_size` and `per_second` values, given recommended tolerances such as 100 ms for backfill jobs and 1500 ms for the scheduler. This would be a significant improvement in DevX. ## Development diff --git a/lib/super_spreader/scheduler_config.rb b/lib/super_spreader/scheduler_config.rb index 8badae8..9627995 100644 --- a/lib/super_spreader/scheduler_config.rb +++ b/lib/super_spreader/scheduler_config.rb @@ -5,19 +5,42 @@ module SuperSpreader class SchedulerConfig < RedisModel + # The job class to enqueue on each run of the scheduler. + attribute :job_class_name, :string + # The number of records to process in each invocation of the job class. attribute :batch_size, :integer + # The amount of work to enqueue, in seconds. attribute :duration, :integer - attribute :job_class_name, :string + # The number of jobs to enqueue per second, allowing for fractional amounts + # such as 1 job every other second using `0.5`. attribute :per_second_on_peak, :float + # The same as per_second_on_peak, but for times that are not identified as + # on-peak. attribute :per_second_off_peak, :float - # UTC crosses the date boundary in an inconvenient way, so allow specifying - # the timezone + # This section manages the definition "on peak." Compare this terminology + # to bus or train schedules. + + # The timezone to use for time calculations. + # + # Example: "America/Los_Angeles" for Pacific time attribute :on_peak_timezone, :string + # The 24-hour hour on which on-peak application usage starts. + # + # Example: 5 for 5 AM attribute :on_peak_hour_begin, :integer + # The 24-hour hour on which on-peak application usage ends. + # + # Example: 17 for 5 PM attribute :on_peak_hour_end, :integer + # The wday value on which on-peak application usage starts. + # + # Example: 1 for Monday attribute :on_peak_wday_begin, :integer + # The wday value on which on-peak application usage ends. + # + # Example: 5 for Friday attribute :on_peak_wday_end, :integer attr_writer :schedule diff --git a/spec/scheduler_job_spec.rb b/spec/scheduler_job_spec.rb index d1d4891..3b5e6e6 100644 --- a/spec/scheduler_job_spec.rb +++ b/spec/scheduler_job_spec.rb @@ -29,7 +29,7 @@ expect(log).to eq(<<~LOG) {"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"} - {"subject":"SuperSpreader::SchedulerJob","batch_size":80,"duration":3600,"job_class_name":"ExampleBackfillJob","per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} + {"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":80,"duration":3600,"per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} {"subject":"SuperSpreader::SchedulerJob","next_id":0} LOG @@ -53,7 +53,7 @@ expect(log).to eq(<<~LOG) {"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"} - {"subject":"SuperSpreader::SchedulerJob","batch_size":80,"duration":3600,"job_class_name":"ExampleBackfillJob","per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} + {"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":80,"duration":3600,"per_second_on_peak":3.0,"per_second_off_peak":3.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} {"subject":"SuperSpreader::SchedulerJob","next_id":0} LOG @@ -78,7 +78,7 @@ expect(log).to eq(<<~LOG) {"subject":"SuperSpreader::SchedulerJob","started_at":"2020-12-16T00:00:00Z"} - {"subject":"SuperSpreader::SchedulerJob","batch_size":1,"duration":1,"job_class_name":"ExampleBackfillJob","per_second_on_peak":1.0,"per_second_off_peak":1.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} + {"subject":"SuperSpreader::SchedulerJob","job_class_name":"ExampleBackfillJob","batch_size":1,"duration":1,"per_second_on_peak":1.0,"per_second_off_peak":1.0,"on_peak_timezone":"America/Los_Angeles","on_peak_hour_begin":5,"on_peak_hour_end":17,"on_peak_wday_begin":1,"on_peak_wday_end":5} {"subject":"SuperSpreader::SchedulerJob","next_id":#{next_model.id}} {"subject":"SuperSpreader::SchedulerJob","next_run_at":"2020-12-16T00:00:01Z"} LOG diff --git a/spec/support/example_backfill_job.rb b/spec/support/example_backfill_job.rb index fab8c3f..3b2336c 100644 --- a/spec/support/example_backfill_job.rb +++ b/spec/support/example_backfill_job.rb @@ -1,17 +1,34 @@ # frozen_string_literal: true +# This class is an example job that uses the interface that SuperSpreader +# expects. While this job is for backfilling as an example, any problem +# that can be subdivided into small batches can be implemented. +# +# In Rails, your class should be located within under `app/jobs/` and should +# inherit from `ApplicationJob`. class ExampleBackfillJob < ActiveJob::Base + # This provides support for stopping the job in an emergency. Optional, but + # highly recommended. extend SuperSpreader::StopSignal + # This is the model class that will be used when tracking the spread of jobs. + # It is expected to be an ActiveRecord class. def self.model_class ExampleModel end + # Batches are executed using this method and are expected to update all IDs + # in the given range. def perform(begin_id, end_id) + # This line is what makes it possible to stop all instances of the job + # using `ExampleBackfillJob.stop!`. Optional, but highly recommended. return if self.class.stopped? - # In a real application, this section would make use of the appropriate, - # efficient database queries. + # In a real application, this section would make use appropriate, efficient + # database queries. + # + # Using SuperSpreader isn't a replacement for efficient SQL. Please + # research options such as https://github.com/zdennis/activerecord-import. ExampleModel.where(id: begin_id..end_id).each do |example_model| example_model.update(example_attribute: "example value") end From cef33361cc30eb3cc21e60a45974dc65cd50ae43 Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Thu, 26 Jan 2023 18:15:53 -0600 Subject: [PATCH 6/7] Edit for clarity per code review --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9833dae..a9f4826 100644 --- a/README.md +++ b/README.md @@ -64,9 +64,15 @@ And the work continues: This process continues until there is no more work to be done. For more detail, please see [Spreader](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/spreader.rb) and [its spec](https://github.com/doximity/super_spreader/blob/master/spec/spreader_spec.rb). -Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. Does the process need to go faster? Increase the number of jobs per second. Are batches taking too long to complete? Decrease the batch size. Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle. Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises. +Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. As it stands, each run of SuperSpreader is hand-tuned. It is highly recommended that SuperSpreader resource utilization is monitored during runs. That said, it is designed to run autonomously once a good configuration is found. -As it stands, each run of SuperSpreader is hand-tuned. It is highly recommended that SuperSpreader resource utilization is monitored during runs. That said, it is designed to run autonomously once a good configuration is found. +Example tuning: + +- Does the process need to go faster? Increase the number of jobs per second. +- Are batches taking too long to complete? Decrease the batch size. +- Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle. + +Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises. ## How do I use it? From 56db4712e85e66ba2315b6d46f8ea5bfffff151a Mon Sep 17 00:00:00 2001 From: Benjamin Oakes Date: Wed, 8 Feb 2023 18:20:26 -0600 Subject: [PATCH 7/7] Add explanation of restarting after stopping --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index a9f4826..01b22cc 100644 --- a/README.md +++ b/README.md @@ -171,6 +171,16 @@ ExampleBackfillJob.stop! The jobs will still be present in the job runner, but will all execute instantly because of the early return as demonstrated in [the example job](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). After the last scheduler job, the process will be paused. +### Restarting + +If you stop the jobs but you wish to restart them later, use the `go!` method and *then* call `SuperSpreader::SchedulerJob.perform_now`. Otherwise, the jobs will not do any work. + +```ruby +ExampleBackfillJob.go! +SuperSpreader::SchedulerJob.go! +SuperSpreader::SchedulerJob.perform_now +``` + ## Installation If you've gotten this far and think SuperSpreader is a good fit for your problem, these are the instructions for installing it.