Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance due to datetime column is not indexed #239

Closed
3cp opened this issue Sep 20, 2022 · 13 comments · Fixed by #240
Closed

Poor performance due to datetime column is not indexed #239

3cp opened this issue Sep 20, 2022 · 13 comments · Fixed by #240

Comments

@3cp
Copy link
Contributor

3cp commented Sep 20, 2022

Hi, we noticed very slow performance on showing the logs in production.
This is due to the default sorting on "datetime" column, but it's not indexed.

Normally people would not notice this issue unless they have massive logs like us.

Please add the index. Thx!

@jheld
Copy link
Collaborator

jheld commented Sep 21, 2022

When I read this, it did surprise me a bit at first.

Curious, what kinds of queries are you performing where you see the slowdown?

https://github.com/soynatan/django-easy-audit/blob/master/easyaudit/models.py#L60 shows the index is composite, object_id and then content_type_id.

So, for any specific object id (and in tandem for that kind of object, e.g. the content type), is that one of the slow queries?

I don't disagree that adding an index including the datetime column would be overall a bad thing -- that said, i do want to make sure that we're reasonably certain that by doing so we wouldn't cause a regression in performance for queries which would grab the time indexed field when it would have been much faster to grab from the obj id & ct id index.

@3cp
Copy link
Contributor Author

3cp commented Sep 21, 2022

Not hand made query, it's just the default django admin ui showing paginated result. Note you have default ordering on "datetime" column which is not indexed.

I would say it's pretty common to sort logs based on time, or search logs based on time. It's bit surprised to me that column is not indexed.

@jheld
Copy link
Collaborator

jheld commented Sep 22, 2022

Yes, that's what I'm asking about. For all changes for a particular object on a model/table, i guess i would be surprised if the django admin pagination was particularly slow. The total count and total matching count queries, however, potentially could be slow and there are ways to get around that which do not require changes to the schema.

Pagination for a particular content type only -- definitely. And similarly for all objects matching a given id (that is, if you have 5 tables and you are looking at the paginated list of crud event for objects with ID 19 -- how useful/meaningful useful that is is a different story).

Would you be willing to install django debug toolbar, load the admin page in question, and post the query timing and generated sql/query plan explain?

Again, we can add another index (ideally a composite index, but am open to it being just the one column), but I am curious about you are experiencing.

@3cp
Copy link
Contributor Author

3cp commented Sep 22, 2022

Sure, I will try to load a dump of production table to my local server for the debug.

@3cp
Copy link
Contributor Author

3cp commented Sep 23, 2022

image

The generated SQL looks expected.

SQL explained
Executed SQL
SELECT "easyaudit_requestevent"."id",
       "easyaudit_requestevent"."url",
       "easyaudit_requestevent"."method",
       "easyaudit_requestevent"."query_string",
       "easyaudit_requestevent"."user_id",
       "easyaudit_requestevent"."remote_ip",
       "easyaudit_requestevent"."datetime"
  FROM "easyaudit_requestevent"
 ORDER BY "easyaudit_requestevent"."datetime" DESC, "easyaudit_requestevent"."id" DESC
 LIMIT 100
Time
3887.160062789917 ms
Database
default
QUERY PLAN
Limit  (cost=658040.85..658052.52 rows=100 width=28) (actual time=4793.420..4793.549 rows=100 loops=1)
  ->  Gather Merge  (cost=658040.85..3016629.19 rows=20215060 width=28) (actual time=4793.418..4793.539 rows=100 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=657040.83..682309.65 rows=10107530 width=28) (actual time=4781.120..4781.125 rows=81 loops=3)
              Sort Key: datetime DESC, id DESC
              Sort Method: top-N heapsort  Memory: 49kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 68kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 56kB
              ->  Parallel Seq Scan on easyaudit_requestevent  (cost=0.00..270738.30 rows=10107530 width=28) (actual time=6.613..2170.459 rows=8093324 loops=3)
Planning Time: 0.312 ms
Execution Time: 4795.226 ms

@3cp
Copy link
Contributor Author

3cp commented Sep 23, 2022

For reference, after I manually indexed datetime column, it became normal. Using the new index as expected.

SQL explained
Executed SQL
SELECT "easyaudit_requestevent"."id",
       "easyaudit_requestevent"."url",
       "easyaudit_requestevent"."method",
       "easyaudit_requestevent"."query_string",
       "easyaudit_requestevent"."user_id",
       "easyaudit_requestevent"."remote_ip",
       "easyaudit_requestevent"."datetime"
  FROM "easyaudit_requestevent"
 ORDER BY "easyaudit_requestevent"."datetime" DESC, "easyaudit_requestevent"."id" DESC
 LIMIT 100
Time
0.9999275207519531 ms
Database
default
QUERY PLAN
Limit  (cost=0.48..8.28 rows=100 width=28) (actual time=0.095..0.154 rows=100 loops=1)
  ->  Incremental Sort  (cost=0.48..1892816.63 rows=24279976 width=28) (actual time=0.095..0.148 rows=100 loops=1)
        Sort Key: datetime DESC, id DESC
        Presorted Key: datetime
        Full-sort Groups: 4  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
        ->  Index Scan Backward using re_datetime_idx on easyaudit_requestevent  (cost=0.44..800217.71 rows=24279976 width=28) (actual time=0.015..0.040 rows=101 loops=1)
Planning Time: 0.246 ms
Execution Time: 0.183 ms

@jheld
Copy link
Collaborator

jheld commented Sep 24, 2022

Thank you for working through that.

I didn't realize you were accessing the request event objects (as opposed to crud event). Yes definitely it makes sense to add that column index. Potentially we would also want to do so as part of another index to make it a composite one but for now that seems outside the scope.

Can you submit a PR with your migration?

@3cp
Copy link
Contributor Author

3cp commented Sep 24, 2022

Sure. Will do. Add index on datetime on all 3 models?

@jheld
Copy link
Collaborator

jheld commented Sep 24, 2022

I'm a bit hesitant to do it on crud event just yet unless others chime in with support or experience with having tried it themselves (e.g. i don't want to negatively affect performance for anyone for queries which are already pretty performant, as can happen when there are more indexes to choose from).

likewise i suppose i'm not surprised that no one has mentioned that the login event is slow given that login events tend to be significantly less write heavy and thus although that table can be big, it should never be anywhere near the size of the others.

so for now, just request event.

@steverecio
Copy link
Contributor

Just came here to create an issue based on the same problem I'm having and happy to see someone else with the same issue. Django admin is incredibly slow due to the datetime column being used for ordering but not being indexed.

Is there anything holding up #240 from being merged? Ideally the CRUDEvent table would be indexed as well (that admin page throws a 504 timeout every time).

@diogosilva30
Copy link

Commenting to report that CRUDEvent page is also unusable. Our production environment has +1 million logs and we if try to search anything our website crashes.

@jheld
Copy link
Collaborator

jheld commented Jan 26, 2023

I do want to be careful about changes to the schema which are solely for issues relating to usage on Django admin. Small targeted changes (e.g. to one model in this case) are a safer bet as a practice when possible even if it means more PRs .

@steverecio
Copy link
Contributor

Could we just change the default ordering to ID instead of datetime? The datetime is auto_now_add so shouldn't ID provide the same sort order anyways?

This would speed up CRUDEvent too without requiring an extra index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants