Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: naive take on adding an index #1466

Merged
merged 2 commits into from
Jun 16, 2022
Merged

Conversation

drewdelano
Copy link
Contributor

@drewdelano drewdelano commented Jun 13, 2022

user_id on uploads should probably be indexed...

completes #61

@drewdelano drewdelano linked an issue Jun 13, 2022 that may be closed by this pull request
@drewdelano drewdelano force-pushed the fix/61-uploads-missing-index branch from 063f885 to 638b86f Compare June 13, 2022 19:05
@drewdelano drewdelano marked this pull request as ready for review June 13, 2022 19:05
Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @drewdelano
Can we know more about the motivation for this index? We currently have UNIQUE (user_id, source_cid) for upload table, which already provides an index for user_id behind the scenes. Here you filter out non deleted uploads, but I would say we don't need it unless we are experiencing slower queries with the already existing index.

@drewdelano
Copy link
Contributor Author

drewdelano commented Jun 14, 2022

@vasco-santos This is the issue I'm looking at:
#61

let query = this._client

We seem to be seeing a steady increase over time which in my mind points to indexing.

This seems related to list all uploads from a given user (that aren't deleted). That said, it might be worth running that query against staging to see how long it takes to return. It's entirely possible I'm wrong here.

@@ -0,0 +1 @@
CREATE INDEX IF NOT EXISTS upload_user_id_deleted_at_idx ON upload (user_id) WHERE deleted_at IS NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CREATE INDEX IF NOT EXISTS upload_user_id_deleted_at_idx ON upload (user_id) WHERE deleted_at IS NULL;
CREATE INDEX CONCURRENTLY IF NOT EXISTS upload_user_id_deleted_at_idx ON upload (user_id) WHERE deleted_at IS NULL;

This needs to concurrently so that the operation does not lock the database

@vasco-santos
Copy link
Contributor

That said, it might be worth running that query against staging to see how long it takes to return. It's entirely possible I'm wrong here.

We can definitely try it. Maybe the deleted is the key difference for users with a large number of uploads

@ribasushi
Copy link

@vasco-santos @drewdelano why not test all this ( EXPLAIN ANALYZE in hand ) on one of your replicas? The indices are all replicated and identical.

@vasco-santos
Copy link
Contributor

@vasco-santos @drewdelano why not test all this ( EXPLAIN ANALYZE in hand ) on one of your replicas? The indices are all replicated and identical.

Yes, good call @ribasushi . I was thinking about trying that before we move forward here. @drewdelano does not have access so must be me to try it

@vasco-santos
Copy link
Contributor

It looks like this helps a lot! @drewdelano

EXPLAIN ANALYZE 
SELECT * FROM upload
WHERE user_id = XXXXXXXXXXXXXXXXX
AND deleted_at IS NULL;

Before Index

 Bitmap Heap Scan on upload  (cost=6443.97..279801.88 rows=203188 width=279) (actual time=5260.205..68250.982 rows=187339 loops=1)
   Recheck Cond: (user_id = 'XXXXXXXXXXXXXXXXX'::bigint)
   Filter: (deleted_at IS NULL)
   Heap Blocks: exact=131036
   ->  Bitmap Index Scan on upload_user_id_source_cid_key  (cost=0.00..6433.81 rows=203799 width=0) (actual time=5228.591..5228.592 rows=187391 loops=1)
         Index Cond: (user_id = 'XXXXXXXXXXXXXXXXX'::bigint)
 Planning Time: 0.118 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.488 ms, Inlining 0.000 ms, Optimization 0.519 ms, Emission 3.815 ms, Total 4.822 ms
 Execution Time: 68301.216 ms
(12 rows)

After Index

 Index Scan using upload_user_id_deleted_at_idx on upload  (cost=0.09..216730.20 rows=202877 width=279) (actual time=0.071..525.612 rows=187339 loops=1)
   Index Cond: (user_id = 'XXXXXXXXXXXXXXXXX'::bigint)
 Planning Time: 3.819 ms
 JIT:
   Functions: 2
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.194 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.194 ms
 Execution Time: 564.808 ms
(8 rows)

@@ -0,0 +1 @@
CREATE INDEX IF NOT EXISTS upload_user_id_deleted_at_idx ON upload (user_id) WHERE deleted_at IS NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also name number will need to be updated

@drewdelano drewdelano force-pushed the fix/61-uploads-missing-index branch from 638b86f to 56bd69f Compare June 16, 2022 12:33
Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drewdelano drewdelano merged commit ebe9bd0 into main Jun 16, 2022
@drewdelano drewdelano deleted the fix/61-uploads-missing-index branch June 16, 2022 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Listing user uploads could be faster
3 participants