Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse column order on syncv3_nid_room_state_idx index #187

Closed
wants to merge 2 commits into from

Conversation

MatMaul
Copy link

@MatMaul MatMaul commented Jun 27, 2023

The order matters because columns that are evaluated with equality need to come before the ones that deal with inequality.

From PG doc:

The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.

Here event_nid is used with <= operator while room_id and is_state are used with =, so event_nid should be last.

some EXPLAIN ANALYZE reflecting that:

 CREATE INDEX IF NOT EXISTS syncv3_nid_room_state_idx ON syncv3_events(room_id, event_nid, is_state);
 
 Nested Loop  (cost=12839.59..13353.77 rows=288 width=39) (actual time=2633.530..2978.064 rows=382 loops=1)
   ->  HashAggregate  (cost=12839.17..12841.17 rows=200 width=8) (actual time=2627.059..2627.576 rows=382 loops=1)
         Group Key: max(syncv3_events_1.event_nid)
         Batches: 1  Memory Usage: 77kB
         ->  GroupAggregate  (cost=0.42..12835.57 rows=288 width=39) (actual time=14.813..2626.020 rows=382 loops=1)
               Group Key: syncv3_events_1.room_id
               ->  Index Only Scan using syncv3_nid_room_state_idx on syncv3_events syncv3_events_1  (cost=0.42..11367.61 rows=293016 width=39) (actual time=7.112..2519.861 rows=439061 loops=1)
                     Heap Fetches: 18205
   ->  Index Scan using syncv3_events_pkey on syncv3_events  (cost=0.42..2.56 rows=1 width=39) (actual time=0.914..0.914 rows=1 loops=382)
         Index Cond: (event_nid = (max(syncv3_events_1.event_nid)))
 Planning Time: 59.901 ms
 Execution Time: 2978.703 ms

CREATE INDEX IF NOT EXISTS syncv3_events_room_nid_idx ON syncv3_events(room_id, event_nid);

 Nested Loop  (cost=10967.33..11481.51 rows=288 width=39) (actual time=1011.844..1282.976 rows=382 loops=1)
   ->  HashAggregate  (cost=10966.91..10968.91 rows=200 width=8) (actual time=1008.189..1008.760 rows=382 loops=1)
         Group Key: max(syncv3_events_1.event_nid)
         Batches: 1  Memory Usage: 77kB
         ->  GroupAggregate  (cost=0.42..10963.31 rows=288 width=39) (actual time=10.244..1007.461 rows=382 loops=1)
               Group Key: syncv3_events_1.room_id
               ->  Index Only Scan using syncv3_events_room_nid_idx on syncv3_events syncv3_events_1  (cost=0.42..9495.37 rows=293012 width=39) (actual time=6.152..922.990 rows=439061 loops=1)
                     Heap Fetches: 18151
   ->  Index Scan using syncv3_events_pkey on syncv3_events  (cost=0.42..2.56 rows=1 width=39) (actual time=0.714..0.714 rows=1 loops=382)
         Index Cond: (event_nid = (max(syncv3_events_1.event_nid)))
 Planning Time: 64.852 ms
 Execution Time: 1283.806 ms

CREATE INDEX IF NOT EXISTS syncv3_nid_room_state_idx ON syncv3_events(room_id, is_state, event_nid);

 Nested Loop  (cost=10968.43..11482.61 rows=288 width=39) (actual time=980.547..1257.002 rows=382 loops=1)
   ->  HashAggregate  (cost=10968.01..10970.01 rows=200 width=8) (actual time=976.311..976.803 rows=382 loops=1)
         Group Key: max(syncv3_events_1.event_nid)
         Batches: 1  Memory Usage: 77kB
         ->  GroupAggregate  (cost=0.42..10964.41 rows=288 width=39) (actual time=7.876..975.659 rows=382 loops=1)
               Group Key: syncv3_events_1.room_id
               ->  Index Only Scan using syncv3_nid_room_state_idx on syncv3_events syncv3_events_1  (cost=0.42..9496.47 rows=293012 width=39) (actual time=4.703..902.883 rows=439061 loops=1)
                     Heap Fetches: 18151
   ->  Index Scan using syncv3_events_pkey on syncv3_events  (cost=0.42..2.56 rows=1 width=39) (actual time=0.730..0.730 rows=1 loops=382)
         Index Cond: (event_nid = (max(syncv3_events_1.event_nid)))
 Planning Time: 50.047 ms
 Execution Time: 1257.509 ms

Signed-off-by: Mathieu Velten mathieuv@matrix.org

Pull Request Checklist

@MatMaul
Copy link
Author

MatMaul commented Jun 27, 2023

The tested query was LatestEventNIDInRooms but it should also improves SelectLatestEventsBetween.

@DMRobertson
Copy link
Contributor

DMRobertson commented Jun 27, 2023

Excellent finding, thank you for persisting.

We should sanity check all the queries that use this index though.

I wonder why the new index performs better, i.e. what work have we skipped? I can't spot much difference in the query planner output (similar numbers of heap fetches?)

@MatMaul
Copy link
Author

MatMaul commented Jun 27, 2023

I forgot to say that those numbers are from a fully cold DB (right after a reboot) so are worst case scenario. The difference seems negligible if running the query a second time.

However it does happen to me IRL and pretty often, after 30mn with no sync this query is cold enough to reliably take ~2+s to run.

It's not clear to me either why the difference is that dramatic.

@kegsay kegsay self-assigned this Jul 18, 2023
@kegsay
Copy link
Member

kegsay commented Jul 18, 2023

Finding SELECT statements which aren't just scanning for ids/nids; potential queries affected:

  • state/event_table.go:308 and similarly state/event_table.go:323: SELECT event_nid, room_id, event_replaces_nid, before_state_snapshot_id, event_type, state_key, event FROM syncv3_events WHERE event_nid IN (SELECT max(event_nid) FROM syncv3_events WHERE event_nid <= $1 AND room_id = ANY($2) GROUP BY room_id)
  • state/event_table.go:341: SELECT event_nid, event FROM syncv3_events WHERE event_nid > $1 AND event_nid <= $2 AND room_id = $3 ORDER BY event_nid ASC LIMIT $4
  • state/event_table.go:350 SELECT event_nid, event FROM syncv3_events WHERE event_nid > $1 AND event_nid <= $2 AND room_id = $3 AND is_state=FALSE ORDER BY event_nid DESC LIMIT $4
  • state/event_table.go:369 SELECT room_id, event_nid, event FROM syncv3_events WHERE event_nid in (SELECT MAX(event_nid) FROM syncv3_events GROUP BY room_id, event_type)
  • state/event_table.go:402 SELECT event_nid, room_id, event FROM syncv3_events WHERE event_nid > ? AND event_nid <= ? AND event_type = ? AND state_key = ? AND room_id IN (?)
  • state/event_table.go:419 SELECT event_nid FROM syncv3_events WHERE event_nid > $1 AND event_nid <= $2 AND event_type = $3 AND room_id = $4 ORDER BY event_nid ASC LIMIT $5

Checking which queries make use of that index today..

@kegsay
Copy link
Member

kegsay commented Jul 18, 2023

First query is the one which this PR intentionally modifies:

postgres=> EXPLAIN ANALYZE SELECT event_nid, room_id, event_replaces_nid, before_state_snapshot_id, event_type, state_key, event FROM syncv3_events WHERE event_nid IN (SELECT max(event_nid) FROM syncv3_events WHERE event_nid <=34243434 AND room_
id = ANY('{"!zqWYtxyQuodYTDeICs:matrix.org","!zNXpeoPYQVxEGetnRx:matrix.org","!zKsFKOSXEvQgTXQoAF:matrix.org"}') GROUP BY room_id);
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5782.38..7474.94 rows=2643 width=542) (actual time=0.110..0.122 rows=3 loops=1)
   ->  HashAggregate  (cost=5781.94..5783.94 rows=200 width=8) (actual time=0.100..0.102 rows=3 loops=1)
         Group Key: max(syncv3_events_1.event_nid)
         Batches: 1  Memory Usage: 40kB
         ->  GroupAggregate  (cost=0.56..5748.90 rows=2643 width=39) (actual time=0.039..0.096 rows=3 loops=1)
               Group Key: syncv3_events_1.room_id
               ->  Index Only Scan using syncv3_nid_room_state_idx on syncv3_events syncv3_events_1  (cost=0.56..5697.42 rows=5010 width=39) (actual time=0.024..0.078 rows=111 loops=1)
                     Index Cond: ((room_id = ANY ('{!zqWYtxyQuodYTDeICs:matrix.org,!zNXpeoPYQVxEGetnRx:matrix.org,!zKsFKOSXEvQgTXQoAF:matrix.org}'::text[])) AND (event_nid <= 34243434))
                     Heap Fetches: 0
   ->  Index Scan using syncv3_events_pkey on syncv3_events  (cost=0.44..8.46 rows=1 width=542) (actual time=0.005..0.005 rows=1 loops=3)
         Index Cond: (event_nid = (max(syncv3_events_1.event_nid)))
 Planning Time: 0.210 ms
 Execution Time: 0.158 ms
(13 rows)

Second query is affected too:

postgres=> EXPLAIN ANALYZE SELECT event_nid, event FROM syncv3_events WHERE event_nid > 342432 AND event_nid <= 34343434 AND room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org' ORDER BY event_nid ASC LIMIT 5;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..20.98 rows=5 width=469) (actual time=0.641..0.646 rows=5 loops=1)
   ->  Index Scan using syncv3_nid_room_state_idx on syncv3_events  (cost=0.56..6833.61 rows=1673 width=469) (actual time=0.640..0.644 rows=5 loops=1)
         Index Cond: ((room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org'::text) AND (event_nid > 342432) AND (event_nid <= 34343434))
 Planning Time: 0.148 ms
 Execution Time: 0.660 ms
(5 rows)

but is hilariously never used, only in tests.

Third query is SelectLatestEventsBetween, which is affected:

postgres=> EXPLAIN ANALYZE SELECT event_nid, event FROM syncv3_events WHERE event_nid > 342432 AND event_nid <= 34343434 AND room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org' AND is_state=FALSE ORDER BY event_nid DESC LIMIT 5;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..21.17 rows=5 width=469) (actual time=0.020..0.022 rows=1 loops=1)
   ->  Index Scan Backward using syncv3_nid_room_state_idx on syncv3_events  (cost=0.56..4716.69 rows=1144 width=469) (actual time=0.020..0.021 rows=1 loops=1)
         Index Cond: ((room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org'::text) AND (event_nid > 342432) AND (event_nid <= 34343434) AND (is_state = false))
 Planning Time: 0.172 ms
 Execution Time: 0.035 ms
(5 rows)

Fourth query is not affected.

Fifth query is affected:

postgres=> EXPLAIN ANALYZE SELECT event_nid, room_id, event FROM syncv3_events WHERE event_nid > 23223 AND event_nid <=  34343434 AND event_type =  'm.room.member' AND state_key = 'foo' AND room_id IN ('{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}');
                                                                                                              QUERY PLAN                                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on syncv3_events  (cost=132.07..136.09 rows=1 width=500) (actual time=0.020..0.021 rows=0 loops=1)
   Recheck Cond: ((event_type = 'm.room.member'::text) AND (state_key = 'foo'::text) AND (room_id = '{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}'::text) AND (event_nid > 23223) AND (event_nid <= 34343434))
   ->  BitmapAnd  (cost=132.07..132.07 rows=1 width=0) (actual time=0.019..0.019 rows=0 loops=1)
         ->  Bitmap Index Scan on syncv3_events_type_sk_idx  (cost=0.00..6.34 rows=178 width=0) (actual time=0.019..0.019 rows=0 loops=1)
               Index Cond: ((event_type = 'm.room.member'::text) AND (state_key = 'foo'::text))
         ->  Bitmap Index Scan on syncv3_nid_room_state_idx  (cost=0.00..125.47 rows=1673 width=0) (never executed)
               Index Cond: ((room_id = '{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}'::text) AND (event_nid > 23223) AND (event_nid <= 34343434))
 Planning Time: 0.136 ms
 Execution Time: 0.040 ms
(9 rows)

Last query is not affected.

state/event_table.go Outdated Show resolved Hide resolved
@kegsay
Copy link
Member

kegsay commented Jul 18, 2023

I wonder why the new index performs better, i.e. what work have we skipped? I can't spot much difference in the query planner output (similar numbers of heap fetches?)

The actual number of table lookups (heap fetches) are the same with both indexes. The value comes from not having to faff about with as much of the index.

With an index of room_id THEN event_nid THEN is_state (which is effectively what the old index did), it can very quickly find the right room, but then has to go through the nids in that room checking is_state. By swapping to room_id THEN is_state THEN event_nid it can rapidly find the events with the correct is_state value, then only needs to scan the nid range. The time saved will correlate with the number of is_state=true events, which in rooms like Matrix HQ is 99% of the events.

In the future when we jump snapshots ahead for gappy state, we'll make more use of is_state=true so this index will help more and more.

I've applied this index manually for now and will sit for a bit to see how metrics behave, and specifically to see if EXPLAIN ANALYZE starts using the new index instead of the old one.

@kegsay
Copy link
Member

kegsay commented Jul 18, 2023

Sadly, only 1 of the queries has started using the new index syncv3_events_nid_room_state_idx, the others are happy using the old one syncv3_nid_room_state_idx:

postgres=> EXPLAIN ANALYZE SELECT event_nid, room_id, event_replaces_nid, before_state_snapshot_id, event_type, state_key, event FROM syncv3_events WHERE event_nid IN (SELECT max(event_nid) FROM syncv3_events WHERE event_nid <=34243434 AND room_id = ANY('{"!zqWYtxyQuodYTDeICs:matrix.org","!zNXpeoPYQVxEGetnRx:matrix.org","!zKsFKOSXEvQgTXQoAF:matrix.org"}') GROUP BY room_id);
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5897.52..7590.08 rows=2636 width=542) (actual time=0.127..0.139 rows=3 loops=1)
   ->  HashAggregate  (cost=5897.08..5899.08 rows=200 width=8) (actual time=0.114..0.115 rows=3 loops=1)
         Group Key: max(syncv3_events_1.event_nid)
         Batches: 1  Memory Usage: 40kB
         ->  GroupAggregate  (cost=0.56..5864.13 rows=2636 width=39) (actual time=0.052..0.110 rows=3 loops=1)
               Group Key: syncv3_events_1.room_id
               ->  Index Only Scan using syncv3_nid_room_state_idx on syncv3_events syncv3_events_1  (cost=0.56..5812.86 rows=4982 width=39) (actual time=0.035..0.090 rows=111 loops=1)
                     Index Cond: ((room_id = ANY ('{!zqWYtxyQuodYTDeICs:matrix.org,!zNXpeoPYQVxEGetnRx:matrix.org,!zKsFKOSXEvQgTXQoAF:matrix.org}'::text[])) AND (event_nid <= 34243434))
                     Heap Fetches: 0
   ->  Index Scan using syncv3_events_pkey on syncv3_events  (cost=0.44..8.46 rows=1 width=542) (actual time=0.007..0.007 rows=1 loops=3)
         Index Cond: (event_nid = (max(syncv3_events_1.event_nid)))
 Planning Time: 0.253 ms
 Execution Time: 0.180 ms
(13 rows)

postgres=> 
postgres=> 
postgres=> EXPLAIN ANALYZE SELECT event_nid, room_id, event FROM syncv3_events WHERE event_nid > 23223 AND event_nid <=  34343434 AND event_type =  'm.room.member' AND state_key = 'foo' AND room_id IN ('{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}');
                                                                                                              QUERY PLAN                                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on syncv3_events  (cost=131.93..135.96 rows=1 width=500) (actual time=0.018..0.019 rows=0 loops=1)
   Recheck Cond: ((event_type = 'm.room.member'::text) AND (state_key = 'foo'::text) AND (room_id = '{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}'::text) AND (event_nid > 23223) AND (event_nid <= 34343434))
   ->  BitmapAnd  (cost=131.93..131.93 rows=1 width=0) (actual time=0.017..0.018 rows=0 loops=1)
         ->  Bitmap Index Scan on syncv3_events_type_sk_idx  (cost=0.00..6.33 rows=177 width=0) (actual time=0.017..0.017 rows=0 loops=1)
               Index Cond: ((event_type = 'm.room.member'::text) AND (state_key = 'foo'::text))
         ->  Bitmap Index Scan on syncv3_nid_room_state_idx  (cost=0.00..125.35 rows=1663 width=0) (never executed)
               Index Cond: ((room_id = '{"!zKsFKOSXEvQgTXQoAF:matrix.org","!zqWYtxyQuodYTDeICs:matrix.org"}'::text) AND (event_nid > 23223) AND (event_nid <= 34343434))
 Planning Time: 0.127 ms
 Execution Time: 0.036 ms
(9 rows)

postgres=> 
postgres=> EXPLAIN ANALYZE SELECT event_nid, event FROM syncv3_events WHERE event_nid > 342432 AND event_nid <= 34343434 AND room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org' AND is_state=FALSE ORDER BY event_nid DESC LIMIT 5;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..20.84 rows=5 width=469) (actual time=0.019..0.021 rows=1 loops=1)
   ->  Index Scan Backward using syncv3_events_nid_room_state_idx on syncv3_events  (cost=0.56..4612.59 rows=1137 width=469) (actual time=0.018..0.019 rows=1 loops=1)
         Index Cond: ((room_id = '!zKsFKOSXEvQgTXQoAF:matrix.org'::text) AND (is_state = false) AND (event_nid > 342432) AND (event_nid <= 34343434))
 Planning Time: 0.154 ms
 Execution Time: 0.032 ms
(5 rows)

I see no impact on DB load with this new index. Inclined to close this.

@kegsay
Copy link
Member

kegsay commented Jul 19, 2023

SELECT event_nid, room_id, event_replaces_nid, before_state_snapshot_id, event_type, state_key, event FROM syncv3_events WHERE event_nid IN (SELECT max(event_nid) FROM syncv3_events WHERE event_nid <= 93323766 AND room_id = ANY(...) GROUP BY room_id);

On a more realistic query where the WHERE event_nid = is closer to the max value in that table, it swaps to using the new index with 9 rooms:

 Planning Time: 0.267 ms
 Execution Time: 2450.660 ms

EXPLAIN ANALYZE SELECT event_nid, room_id, event FROM syncv3_events WHERE event_nid > 0 AND event_nid <= 93323766 AND event_type = 'm.room.topic' AND state_key = '' AND room_id IN (...);

This never uses the new query, preferring:

 Index Scan using syncv3_events_type_room_nid_idx on syncv3_events  (cost=0.56..11.89 rows=1 width=500) (actual time=0.450..0.451 rows=0 loops=1)
   Index Cond: ((event_type = 'm.room.topic'::text) AND (room_id =...)

EXPLAIN ANALYZE SELECT event_nid, event FROM syncv3_events WHERE event_nid > 0 AND event_nid <= 93323766 AND room_id = '!vpdMrhHjzaPbBUSgOs:matrix.org' AND is_state=FALSE ORDER BY event_nid DESC LIMIT 5;

This does use the new index:

 Limit  (cost=0.56..20.83 rows=5 width=469) (actual time=0.020..0.027 rows=5 loops=1)
   ->  Index Scan Backward using syncv3_events_nid_room_state_idx on syncv3_events  (cost=0.56..9393.67 rows=2317 width=469) (actual time=0.019..0.025 rows=5 loops=1)
         Index Cond: ((room_id = '!vpdMrhHjzaPbBUSgOs:matrix.org'::text) AND (is_state = false) AND (event_nid > 0) AND (event_nid <= 93323766))
 Planning Time: 0.173 ms
 Execution Time: 0.039 ms
(5 rows)

Unsure how to interpret this..

@kegsay
Copy link
Member

kegsay commented Sep 19, 2023

I'm going to close this for now. The testing back in July didn't make this an obvious improvement. We can look into this some more if/when performance becomes an issue.

@kegsay kegsay closed this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants