Batch tokens deletion in cleartokens command #969

merito · 2021-04-24T22:54:50Z

Fixes #651

Description of the Change

Use batches when deleting expired tokens using cleartokens management command. Introduce two setting variables to configure batch size and interval between deletions. The defaults were tested on quite weak machines and ~1.5M deleted tokens. It allows to run cleartokens without any downtime.

Checklist

PR only contains one change (considered splitting up PR)
unit-test added
documentation updated
CHANGELOG.md updated (only for user relevant changes)
author name in AUTHORS

codecov · 2021-04-25T05:56:24Z

Codecov Report

Merging #969 (725c3c9) into master (e4c98c7) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #969      +/-   ##
==========================================
+ Coverage   96.64%   96.67%   +0.03%     
==========================================
  Files          31       31              
  Lines        1756     1775      +19     
==========================================
+ Hits         1697     1716      +19     
  Misses         59       59

Impacted Files	Coverage Δ
oauth2_provider/settings.py	`100.00% <ø> (ø)`
oauth2_provider/models.py	`98.76% <100.00%> (+0.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4c98c7...725c3c9. Read the comment docs.

auvipy

unit and integration tests for the changed logics are needed. also need explanation why atomic transacion is not needed here.

oauth2_provider/models.py

MattBlack85 · 2021-10-20T09:38:23Z

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

merito · 2021-10-22T23:06:21Z

I am a bit worried about the performance that this may have on large postrgesql databases because of the .count() and id__in, can we measure and report a run against a big table to see how it performs?

Note: If you only need to determine the number of records in the set (and don’t need the actual objects), it’s much more efficient to handle a count at the database level using SQL’s SELECT COUNT(*). Django provides a count() method for precisely this reason.

source: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#when-querysets-are-evaluated

Performance is the reason why this PR emerges. I've tried to remove ~1.5M of stale tokens and I've failed because very high RAM usage. When I'd introduced batching it passed without any peaks in resources of machine on postgres installed on very basic AWS instance. Because AccessToken model is simple the default 10000 items batch size is the best balance between speed and low resource usage.

n2ygk

Sorry for the delayed follow up on this PR. It looks good. Please rebase and resolve conflicts. I'm targeting the next minor release for this. Thanks! (I'm afraid to check our production DB to see how many expired tokens are in it;-)

Do not check for merge conflicts in AUTHORS file, because ======= in 2nd line triggers the error.

merito · 2021-12-22T17:31:13Z

I had to exclude AUTHORS from check-merge-conflict, because it causes

Check for merge conflicts................................................Failed
- hook id: check-merge-conflict
- exit code: 1

Merge conflict string "=======
" found in AUTHORS:2

You can keep this change or change separator in AUTHORS file.

.pre-commit-config.yaml

n2ygk · 2021-12-23T20:08:00Z

I think I'll approach this by fixing AUTHORS separately as the problem doesn't "belong" to this PR.

Also, since I'm targeting 1.7.0 for this as it's more than a patch, let's hold it off a bit until 1.6.1. gets published.

n2ygk · 2021-12-23T20:16:10Z

pre-commit/pre-commit-hooks#100 describes the problems with RST underline.

change made but not marked resolved.

n2ygk · 2022-01-01T17:05:50Z

@auvipy no test existed prior to this PR so let's accept it as is and put writing a test for the command on the backlog.

n2ygk · 2022-01-11T19:23:12Z

oauth2_provider/settings.py

@@ -101,6 +101,8 @@
    # Whether to re-create OAuthlibCore on every request.
    # Should only be required in testing.
    "ALWAYS_RELOAD_OAUTHLIB_CORE": False,
+    "CLEAR_EXPIRED_TOKENS_BATCH_SIZE": 10000,
+    "CLEAR_EXPIRED_TOKENS_BATCH_INTERVAL": 0.1,


@merito Sorry this is after the fact but wouldn't a default value of 0 be best, especially since the sleep is always executed even if the batch is tiny.
https://github.com/merito/django-oauth-toolkit/blob/725c3c9d8927379c9808abd1badb4fcd9ff1cbaa/oauth2_provider/models.py#L636

merito marked this pull request as ready for review April 24, 2021 23:03

auvipy approved these changes Apr 25, 2021

View reviewed changes

auvipy previously requested changes Oct 19, 2021

View reviewed changes

oauth2_provider/models.py Show resolved Hide resolved

oauth2_provider/models.py Show resolved Hide resolved

n2ygk reviewed Dec 22, 2021

View reviewed changes

n2ygk added this to the 1.7.0 milestone Dec 22, 2021

dawidwolski-identt added 2 commits December 22, 2021 17:52

Batch tokens deletion in cleartokens command

def3adf

CHANGELOG.md and AUTHORS

2340bdf

Do not check for merge conflicts in AUTHORS file, because ======= in 2nd line triggers the error.

merito force-pushed the feature-improve-cleartokens-performance branch from e4428bc to 2340bdf Compare December 22, 2021 17:14

Merge branch 'master' into feature-improve-cleartokens-performance

ce70f80

merito requested a review from auvipy December 22, 2021 17:33

Merge branch 'master' into feature-improve-cleartokens-performance

a41e6f6

n2ygk requested changes Dec 23, 2021

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

merito and others added 3 commits December 27, 2021 12:43

Merge branch 'master' into feature-improve-cleartokens-performance

a0474d9

Issue with AUTHORS file fixed in 1.6.1

caf5679

Merge branch 'master' into feature-improve-cleartokens-performance

725c3c9

n2ygk approved these changes Jan 1, 2022

View reviewed changes

n2ygk merged commit c42423c into jazzband:master Jan 1, 2022

n2ygk mentioned this pull request Jan 1, 2022

Write a test for cleartokens management command #1065

Closed

n2ygk reviewed Jan 11, 2022

View reviewed changes

n2ygk mentioned this pull request Jan 11, 2022

Default value for CLEAR_EXPIRED_TOKENS_BATCH_INTERVAL #1087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch tokens deletion in cleartokens command #969

Batch tokens deletion in cleartokens command #969

merito commented Apr 24, 2021 •

edited

Loading

codecov bot commented Apr 25, 2021 •

edited

Loading

auvipy left a comment

MattBlack85 commented Oct 20, 2021

merito commented Oct 22, 2021 •

edited

Loading

n2ygk left a comment

merito commented Dec 22, 2021

n2ygk commented Dec 23, 2021

n2ygk commented Dec 23, 2021

n2ygk commented Jan 1, 2022

n2ygk Jan 11, 2022

Batch tokens deletion in cleartokens command #969

Batch tokens deletion in cleartokens command #969

Conversation

merito commented Apr 24, 2021 • edited Loading

Description of the Change

Checklist

codecov bot commented Apr 25, 2021 • edited Loading

Codecov Report

auvipy left a comment

Choose a reason for hiding this comment

MattBlack85 commented Oct 20, 2021

merito commented Oct 22, 2021 • edited Loading

n2ygk left a comment

Choose a reason for hiding this comment

merito commented Dec 22, 2021

n2ygk commented Dec 23, 2021

n2ygk commented Dec 23, 2021

n2ygk commented Jan 1, 2022

n2ygk Jan 11, 2022

Choose a reason for hiding this comment

merito commented Apr 24, 2021 •

edited

Loading

codecov bot commented Apr 25, 2021 •

edited

Loading

merito commented Oct 22, 2021 •

edited

Loading