Performance issues when there is a large number of failed jobs #325

mattburton · 2023-09-14T16:55:48Z

Hello - we encountered an issue recently where we had a Hangfire instance that had collected a large number of failed jobs (over 6,000) due to some issues in our system. Luckily this was a staging environment, so they were accumulating unnoticed, but when we tried to load the dashboard it was incredibly slow to perform any action. This issue did not appear to affect the creation or execution of jobs themselves, thankfully, but our monitoring tooling is showing that the top 3 slowest queries in our system currently are all Hangfire dashboard / cleanup related:

SELECT j . id Id, j . invocationdata InvocationData, j . arguments Arguments, j . createdat CreatedAt, j . expireat ExpireAt, ? FetchedAt, j . statename StateName, s . reason StateReason, s . data StateData FROM hangfire . job j LEFT JOIN hangfire . state s ON j . stateid = ? . id WHERE j . statename = @StateName ORDER BY j . id DESC LIMIT @Limit OFFSET @Offset

SELECT statename State, COUNT ( id ) Count FROM hangfire . job WHERE statename IS NOT ? GROUP BY statename SELECT COUNT ( * ) FROM hangfire . server SELECT SUM ( value ) FROM ( SELECT SUM ( value ) FROM hangfire . counter WHERE key = ? UNION ALL SELECT SUM ( value ) FROM hangfire . aggregatedcounter WHERE key = ? ) c SELECT SUM ( value ) FROM ( SELECT SUM ( value ) FROM hangfire . counter WHERE key = ? UNION ALL SELECT SUM ( value ) FROM hangfire . aggregatedcounter WHERE key = ? ) c SELECT COUNT ( * ) FROM hangfire . set WHERE key = ?

DELETE FROM hangfire . job WHERE id IN ( SELECT id FROM hangfire . job WHERE expireat < NOW ( ) LIMIT ? )

Perhaps there are missing indexes, or index adjustments needed to handle the situation where there are large numbers of failed jobs?

We have wiped out the current state on our staging server for now, but we have backups for that time period if you need any more specific information.

Thanks in advance for any help you can provide and for all you do to maintain this library - it's greatly appreciated!

The text was updated successfully, but these errors were encountered:

azygis · 2023-09-19T07:32:26Z

Would like to have the backup of hangfire tables, if you can provide them. Could help a lot to not create imaginary random values.

HarryCordewener mentioned this issue Aug 2, 2024

Loading 5000 Failed jobs when there's even more causes a Memory Leak #373

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues when there is a large number of failed jobs #325

Performance issues when there is a large number of failed jobs #325

mattburton commented Sep 14, 2023

azygis commented Sep 19, 2023

Performance issues when there is a large number of failed jobs #325

Performance issues when there is a large number of failed jobs #325

Comments

mattburton commented Sep 14, 2023

azygis commented Sep 19, 2023