Slow queries against table 'extras_objectchange' #3392
Labels
status: accepted
This issue has been accepted for implementation
type: bug
A confirmed report of unexpected behavior in the application
Environment
Steps to Reproduce
Expected Behavior
I expected things to function without issues.
Observed Behavior
I believe the web API creates SQL statements against table 'extras_objectchange', column 'time' (a column with no index), creating a bottleneck due to the poor database performance of multiple sequential scans of table 'extras_objectchange'. Table 'extras_objectchange' needs an index on column 'time'. Here's the long version of what was observed:
My company is migrating off a DCIM product built completely in-house. As one of the steps in the migration we've written python scripts to read our old DCIM tool and populate NetBox using this fetched data. The migration script uses pynetbox to send queries to the NetBox web API. Since we've not gotten the migrated data just right in the first, or second, or n'th try, we've produced a good number of rows in table 'extras_objectchange'...CHANGELOG_RETENTION value is kept at default of 90 days. We've noticed at at a certain point the migration script started running much slower. After looking, gunicorn workers were queueing requests, taking up memory and eventually leading to swapping. Enabling postgres statement logging showed that there were a large number of long-running requests against table 'extras_objectchange', statements such as:
SELECT "extras_objectchange"."id", "extras_objectchange"."time", "extras_objectchange"."user_id", "extras_objectchange"."user_name", "extras_objectchange"."request_id", "extras_objectchange"."action", "extras_objectchange"."changed_object_type_id", "extras_objectchange"."changed_object_id", "extras_objectchange"."related_object_type_id", "extras_objectchange"."related_object_id", "extras_objectchange"."object_repr", "extras_objectchange"."object_data" FROM "extras_objectchange" WHERE "extras_objectchange"."time" < '2019-04-26T14:44:52.229708+00:00'::timestamptz;
The static timestamp value used in the above statement was exactly 90 days (our/default logging retention period) at the time of execution. Although I've traced these statements to a definite source, I don't see postgresql table triggers that would do this, and thus assume it is some sort of internal NetBox created statement. In any case, we see a lot of these statements when running our sync script. The problem with the statements is that their WHERE clause compares the 'time' column to a static value, and there appears to be no 'time' column index on table 'extras_objectchange'. This combined with the number of rows we have in the table is making for really poor performance. We're sequentially scanning a 1,000,000 row table.
Thanks for all the hard work creating and maintaining an awesome DCIM/IPAM product.
The text was updated successfully, but these errors were encountered: