Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using batches #141

Open
mschuurmans opened this issue Jan 10, 2020 · 11 comments
Open

Deadlock when using batches #141

mschuurmans opened this issue Jan 10, 2020 · 11 comments
Labels

Comments

@mschuurmans
Copy link

mschuurmans commented Jan 10, 2020

Hangfire.Core: v1.7.8
Hangfire.PostgreSql: v1.6.3
Hangfire.Pro: v2.2.1
Dotnet Core: v2.2

Possible deadlock in the postgresql storage connector.

I don't see anything when using Trace logging for Hangfire.I think there is something wrong with the locking but don't know how to debug this. To test this i made a empty project with just hangfire and those 2 jobs.

The first enqueue is enqueued by BackgroundJob:

BackgroundJob.Enqueue(() => StartCache.Execute(null));

The StartCache job then inserts a batch job:

BatchJob.StartNew(x => {
    foreach(var id in customerIds) {
        x.Enqueue(() => CacheCustomer.Execute(id, null));
    }
});

The Batchjob is inserted en the jobs are enqueued. It says they are fetched but nothing happens:
image

On the batches page it says that it's processing 1 job but nothing is happening:
image

Let me know if you require any information or have some pointers on how to find what the issue is here.

@mschuurmans
Copy link
Author

The lock table in the database shows the following (screenshot is taken at 7:45 UTC)

image

@frankhommers
Copy link
Collaborator

Is this still a problem here? Since you closed it on the Hangfire project itself?

@mschuurmans
Copy link
Author

It still is an issue. I closed it on the Hangfire project itself because the issue itself is not a "problem" with the Hangfire project. Currently Hangfire pro is only supported on SqlServer and Redis.pro.

I had some contact with Sergey and he told me that it's the way locking works in de Hangfire.Postgresql that is causing deadlocks. But it will not be an easy task to change this i think. On the other side you have the question how many people would use it, since afaict nobody is using Hangfire.postgresql with hangfire.pro batches (i could be very wrong here..).

Sergey also told me that he is considering contributing to postgresql/mysql but he currently doesn't know when so you don't want to give any promises.

@frankhommers
Copy link
Collaborator

How does it need to be changed according to Sergey then?

@mschuurmans
Copy link
Author

Quoting Sergey

Batches in Hangfire.Pro use all the methods available in JobStorageTransaction and JobStorageConnection classes, so all the available methods should be implemented. But there are a lot of non-functional requirements, which I attempted to document here – HangfireIO/Hangfire#1296 2. So things are much more complicated than I thought initially. And even worse, there’s no documented wisdom on the topic of storage abstractions other than serializability, so this will require a lot of time unfortunately.

As far as I know, the worst moment in the Hangfire.Postgresql implementation is distributed locking. Unlike Hangfire.SqlServer that uses application locks (some kind of advisory locks in PostgreSQL), which are completely in-memory data structures and bound to connections, in the current PostgreSQL implementation locks are based on a dedicated table and fully materializable, which make them hard to be released on a process failure and prone to deadlocks.

Second mail

These years I’ve studied hard on this topic in order to get everything as much simple as possible. I know the answer now, but it will take a lot of time to implement everything without revolution, i.e. in a non-backward-compatible way. I’m even considering to contribute to Hangfire.PostgreSQL and Hangfire.MySQL, but currently don’t know when, so don’t want to give any promises.

I didn’t do this before, because there are so many nuances between different storages, between their transaction isolation semantics (despite they are all ACID-compliant), especially when applied to storage abstractions, that my mind was completely blown up.

@frankhommers
Copy link
Collaborator

Moving to locks like these: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS wouldn't be that hard I guess. AFAIK that will require PostgreSQL 9.0 and upwards..

But I don't know if that's a good move.

@mschuurmans
Copy link
Author

I'm don't know either, i don't know which versions people are running this on.

But besides that there were some earlier comments why advisory locks shouldn't be used here #82

This is from a few years back i can't really tell if this is still relevant.

@frankhommers
Copy link
Collaborator

Yeah. The current locks are cross server. Advisory locks are not. But it could be an option if you're running just one hangfire server.

@mattgenious
Copy link

Hi, any news on this?

@frankhommers
Copy link
Collaborator

Is this still the case with 1.9.3

@LirazHaim86
Copy link

LirazHaim86 commented Jan 25, 2022

Yes and no, I will explaim
Yes: the exception is still in there
No: it's work

But becuase the exception we cannot be sure that the code is working so bottom line I think its not working

The exception is:

(Hangfire.PostgreSql.PostgreSqlDistributedLock) hangfire:batch:086beed2-3a13-4eff-8693-fd117f43de34:lock: Failed to lock with transaction
Npgsql.PostgresException
40001: could not serialize access due to concurrent update
at Npgsql.Internal.NpgsqlConnector.g__ReadMessageLong|213_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteNonQuery()
at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action2 paramReader) in /_/Dapper/SqlMapper.cs:line 2858 at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command) in /_/Dapper/SqlMapper.cs:line 581 at Hangfire.PostgreSql.PostgreSqlDistributedLock.TransactionLockHandler.Lock(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options) 2022-01-26 10:14:09 [WARN] (Hangfire.PostgreSql.PostgreSqlDistributedLock) hangfire:batch:086beed2-3a13-4eff-8693-fd117f43de34:lock: Failed to lock with transaction Npgsql.PostgresException 40001: could not serialize access due to concurrent update at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|213_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery() at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action2 paramReader) in //Dapper/SqlMapper.cs:line 2858
at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command) in /
/Dapper/SqlMapper.cs:line 581
at Hangfire.PostgreSql.PostgreSqlDistributedLock.TransactionLockHandler.Lock(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants