Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #224. I believe this is a Postgres only issue #231

Merged
merged 4 commits into from
Aug 6, 2024

Conversation

hms
Copy link
Contributor

@hms hms commented May 22, 2024

Per all of my earlier discussion, Postgres transactions become invalid upon any database contraint invalidating an operation and make all follow on database interactions invalid until a rollback is executed.

Semephore#attempt_creation uses a rather common database pattern to rely on database locks to reliably syncrhronize based on uniqueness of values. The catch is that Postgresql requires a little more handholding to use this pattern. By wrapping just the Semaphore.create statement in a SAVEPOINT (a pretend nested transaction), we have what appears to be a transparent operation that works across all of the other supported databases without any known consequences AND allows Postgres to continue to work as expected per the existing code.

This change works across the Solid supported databases (at least based on the tests) and I believe the extra overhead for the non-postgres databases is small enough that special casing the code or resorting to the complexity of dependency injection is just not worth it.

I added two tests to the concurrency_controls_test.rb. One to show the error is real and is easy to reproduce. The other to show the fix fixes it.

@hms
Copy link
Contributor Author

hms commented May 26, 2024

@rosa

I'm pausing on this for just a little as I did a little more digging (probably should have checked before coding a fix....) and there are some significant concerns around adding sub-transactions / savepoint for high volume deployments, especially those with replication / failover.

While my proposed solution is database agnostic (ignoring the Postgres performance issues referenced above), the tests cannot be. Both SqLite and mySQL work without a testable failure with either my PR or the original SolidQueue implementation.

There look to be some non-nested transaction / savepoint implementations, but they all require different syntax for each of the 3 supported databases. I'm open to investigating those alternatives, but it would help to understand if the SolidQueue team would be open to a PR based on database specific syntax and how to organize the code for any PRs with this approach.

Open issues that I could use some direction on:

  • Does the SolidQueue team consider this a SolidQueue problem, a Rails upstream issue, or somewhere else I'm not looking at yet?
    • If the SolidQueue tam would prefer to defer to the upstream work that appears to be under consideration, is the team open to a temporary solution for the problem that does exist now?
    • If yes to a SolidQueue fix now, can I get some direction on the team's requirements for how to structure / organize / call out database specific implementations where and when necessary?

Thanks,

Hal

@hms hms force-pushed the postgres_transaction_issue branch from 32758f8 to 6411faa Compare June 27, 2024 23:38
@hms
Copy link
Contributor Author

hms commented Jun 27, 2024

@rosa

I've revamped this PR to avoid nested transaction.

Postgres, unlike mysql and sqlite, invalidates the current transaction on
a database contraint violation such as an attempt to insert
a duplicate key into a unique index.

SolidQueue uses a pretty standard design pattern of application level
conflict detection and resolution via error handling as part of it's
concurrency management in Semaphore::Proxy#attempt create (via create! and rescue).

For Postgres based implementations when:
* enqueing a job inside a transaction
* under load such that the race condition triggerng the simultanious conflicted insert

the transaction would become unrecoverable at the application level.

There are two simple paths to address this issue in Postgres:
* Nested transactions
* Insert using 'on conflict do nothing'

Nested transaction are the easiest to implement and are portable across all 3
of the Rails standard databases. Unfortunately, nested transactions have a
known and very signifant performance problem, especially for databases under
load and/or replicated databases with long running queries.

Each of the big three database inplements the ANSI standard Insert on conflict do nothing
in a very different way.

Because of this, this PR only replaces the current implementation of
application level conflict handling for Postgres.  It takes advantage
of the Rails standard ActiveRecord::Persistence::Insert method supporting on
conflict semantics.

@rosa
Copy link
Member

rosa commented Aug 2, 2024

Hey @hms, so sorry for the delay here! I've been totally focused on other stuff (and other projects) but I'm finally trying to wrap up everything for v1.0 and would like to include this fix. This last approach looks great to me. I have some suggestions for the implementation, but it's possible that, being so long ago, you no longer have time (or interest!) to implement these. In that case just let me know, happy to add these myself.

Thank you so much for figuring out this one 🙏

@hms
Copy link
Contributor Author

hms commented Aug 2, 2024

@rosa

I'd love any feedback / suggestions to make the PR better and more durable for the project. I'm happy to get this wrapped up to releaseable standards.

Hal

@@ -183,7 +183,46 @@ class ConcurrencyControlsTest < ActiveSupport::TestCase
assert job.reload.ready?
end

if ActiveRecord::Base.connection.adapter_name == "PostgreSQL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have TARGET_DB, I think I'd use that here, like here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is better. I'll submit the change.

SolidQueue::Semaphore.create!(key: key, value: limit - 1, expires_at: expires_at)
end
end
end
Copy link
Member

@rosa rosa Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... actually, I'm not completely sure if we need this test at all, since it seems it's testing more something internal of Active Record, rather than something we have in Solid Queue 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this really is more of a test of AR, its value is that we have a confirmation of outcomes with the two different ways to add records to the Semaphore table and we show the two mechanisms "side-by-side". I realized that it was missing an assert to meet that goal. See what you think with the extra line.

In the end, if you still feel strongly about cleaning this test up a little, I'll remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I don't feel strongly about this one 😊

@@ -41,16 +41,30 @@ def signal
end

private

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this project I'm following the style of not having a space after private when the class has public and private methods, so this one was on purpose 😅

limit == 1 ? false : attempt_decrement
end

if ActiveRecord::Base.connection.adapter_name == "PostgreSQL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's a method that checks directly whether the adapter supports unique_by exposed in Active Record's Abstract adapter, that we could use simply as

connection.supports_insert_conflict_target?

I think this might be more robust than checking the adapter name, because other adapters for PostgreSQL might have a different name, like the PostGIS adapter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So funny and just goes to show a bit of tunnel vision. I spent a fair amount of time digging around for an alternative to the original nested transactions methodology and in that digging I came across this method in the Rails adaptors.

The tunnel vision part was no thinking to use those methods as the much, much better way to test which of the code paths to take.

Thanks for this -- it's a great reminder to slow down and think before coding.

if ActiveRecord::Base.connection.adapter_name == "PostgreSQL"
alias attempt_creation attempt_creation_with_insert_on_conflict
else
alias attempt_creation attempt_creation_with_create_and_exception_handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that this all could perhaps live in the Semaphore class in a simpler way. Something like this method:

diff --git a/app/models/solid_queue/semaphore.rb b/app/models/solid_queue/semaphore.rb
index 34bf8dc..1448e67 100644
--- a/app/models/solid_queue/semaphore.rb
+++ b/app/models/solid_queue/semaphore.rb
@@ -17,6 +17,17 @@ module SolidQueue
       def signal_all(jobs)
         Proxy.signal_all(jobs)
       end
+
+
+      def create_unique_by(attributes)
+        if connection.supports_insert_conflict_target?
+          insert(attributes, unique_by: :key).any?
+        else
+          create!(attributes)
+        end
+      rescue ActiveRecord::RecordNotUnique
+        false
+      end
     end

     class Proxy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that could be used in Proxy class like this (BTW I might remove that class in the future and leave everything in Semaphore... it was not to have to pass all the job attributes around, but still... anyway, not the goal of this PR):

        def attempt_creation
          if Semaphore.create_unique_by(key: key, value: limit - 1, expires_at: expires_at)
            true
          else
            check_limit_or_decrement
          end
        end

        def check_limit_or_decrement
          if limit == 1 then false
          else
            attempt_decrement
          end
        end

And you wouldn't need to define aliases.

@hms
Copy link
Contributor Author

hms commented Aug 4, 2024

@rosa

Thank you for your feedback and suggestions. The PR is significantly better because of them.

I have a PR ready. How do you feel about rebasing and squishing the older commits that, given the last set of changes, don't add much value from a history perspective? The only downside is I think your comments might get lost in that process and given how important they are to the finished PR, I'm not sure how I feel about that.

Hal

@rosa
Copy link
Member

rosa commented Aug 5, 2024

Hey @hms! Thanks a lot for that! 🙏 No need to squash the commits, totally as you prefer. Feel free to push things and keep the original history.

hms added 3 commits August 5, 2024 11:24
Postgres, unlike mysql and sqlite, invalidates the current transaction on
a database contraint violation such as an attempt to insert
a duplicate key into a unique index.

SolidQueue uses a pretty standard design pattern of application level
conflict detection and resolution via error handling as part of it's
concurrency management in Semaphore::Proxy#attempt create (via create! and rescue).

For Postgres based implementations when:
* enqueing a job inside a transaction
* under load such that the race condition triggerng the simultanious conflicted insert

the transaction would become unrecoverable at the application level.

There are two simple paths to address this issue in Postgres:
* Nested transactions
* Insert using 'on conflict do nothing'

Nested transaction are the easiest to implement and are portable across all 3
of the Rails standard databases. Unfortunately, nested transactions have a
known and very signifant performance problem, especially for databases under
load and/or replicated databases with long running queries.

Each of the big three database inplements the ANSI standard Insert on conflict do nothing
in a very different way.

Because of this, this PR only replaces the current implementation of
application level conflict handling for Postgres.  It takes advantage
of the Rails standard ActiveRecord::Persistence::Insert method supporting on
conflict semantics.
Postgres only Issue:
Jobs that meet all of the following conditions will result in the database connection entering an invalid and
unrecoverable state, requiring a rollback before any further database requests (reads or writes) can be made:

1. Use SolidQueue's 'limits_concurrency' macro
2. Are enqueued inside an application-level transaction
3. The limits_concurrency key already exists in the Semaphore table (i.e., this is job 2 - N for a given key)

SolidQueue uses the following design pattern to implement the conflict detection of the limits_concurrency
macro which works 100% correctly, 100% of the time, with MySQL and SQLite:
  begin
    Semaphore.create!(...)
      no_conflict_path()
    rescue ActiveRecord::RecordNotUnique
      handle_concurrency_conflict()
    end

The problem is Postgres:
It's not possible to rescue and recover from an insert failing due to an conflict on unique index (or any other
database constraint). Until a rollback is executed, the database connection is in an invalid state and unusable.

Possible Solutions:
1. Nested transactions
   + Easiest to implement
   + Portable across all three standard Rails databases
   - Has significant performance issues, especially for databases under load or replicated databases with
     long-running queries (Postgres specific)
   - Requires using Raise and Rescue for flow of control (performance, less than ideal coding practice)
   - Requires issuing a rollback (performance, additional command that must round trip from the client to
     database and back)

2. Insert into the Semaphore table using 'INSERT INTO table (..) VALUES (...) ON CONFLICT DO NOTHING' syntax
   + ANSI standard syntax (not a Postgres 'one off' solution)
   + Rails natively supports identifying database adaptors that support this syntax, making the implementation
     portable and maintainable
   + Supported by Postgres and allows this issue to be addressed
   + Does not require Raise and Rescue for flow of control
   + Performant (native, fast path database functionality)

Solution:
Leverage Rails connection adaptors allowing code to easily identifying supported database features/functionality to
implement strategy rails#2 (INSERT ON CONFLICT) for those databases that support it (i.e., Postgres) and leave the
original implementation of rescuing RecordNotUnique for those that do not.

Note: Although I'd love to take credit for the "quality" of this implementation, all that credit belongs to @rosa who's
excellent feedback on an earlier implementation resulted in this significantly better implementation.
@hms hms force-pushed the postgres_transaction_issue branch from 096ae0e to c817349 Compare August 5, 2024 18:28
@hms
Copy link
Contributor Author

hms commented Aug 5, 2024

@rosa I could use some help here. The tests run clean in my local environment over multiple runs.

Interestingly, at least part of the error output refers to the Postgres transactional error that was the root cause for this PR. I'm a RSpec person, so I'm not nearly as familiar with mini-test as I should be. Does it wrap tests in transactions? If so, then any test that hits a database managed constraint (unique, checks, foreign keys, etc) are all going to be at risk for trigging: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block (ActiveRecord::StatementInvalid)

Hal

@rosa
Copy link
Member

rosa commented Aug 6, 2024

Ohhh yes! I think some of those failures are legit and not related to your changes. I've got a similar fix to what you did here in the works in 3d40b1a

@rosa
Copy link
Member

rosa commented Aug 6, 2024

Thanks again for this, @hms, amazing work 🙏 🙇

@rosa rosa merged commit addd870 into rails:main Aug 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants