300 comment limit. #3306

dessalines · 2023-06-23T20:46:05Z

People are currently ddos'ing lemmy servers by attacking a vulnerability having to do with high-comment threads.

SleeplessOne1917 · 2023-06-23T21:03:54Z

@Nutomic tagging you since I'm unable to approve this.

@dessalines once this is merged, I'm looking into using this package. I think it makes sense to validate at out API endpoints moreso than it does on the DB view.

tbe · 2023-06-23T22:35:28Z

I had a look at the PR, and i think this issue may result from offset and limit clauses, most likely paired with an order by somewhere in the generated statement.

From the view of a database, this is a worsed case scenario. What the database does is:

Select ALL matching entries
Order them ALL (and if there is not enough memory, do it on disk)
Return only the entries that match limit and offset

This works well enough for smaller tables, but for large tables, this can be the death of an production database.

I will try to reproduce this issue locally, and maybe i can come up with something better here.

tbe · 2023-06-23T23:07:06Z

ORDER BY subpath("comment"."path", $10, $11)

O dear, sorry i have to say that, but that is bad. Really bad. There is no index that would help here.

The database schema MUST change for this. Better solution would be a column with parent ID's and then using a recursive CTE.

I can provide an example schema, and a statement to do this. Maybe even a migration path, but first i have to understand how this works currently.

tbe · 2023-06-23T23:22:42Z

OK, i had a look at the query mentioned by @RocketDerp . The column is of type ltree, so there is no issues with substring parsing or things like that.

But: There is no index either, and for ordering, it is a bad idea for large sets. A recursive CTE would be far more efficient.

The good news: As we have an ltree, we can adopt the schema fast, to just store the parent comment ID. This may have side effects, for example if a user deletes a comment, or his account, what happens? Is the parent comment preserved as a placeholder?

@dessalines , can you provide this information?

tbe · 2023-06-23T23:45:44Z

The more i think about this queries, the more my brain hurts. I will open a new issue to track this kind of problems.

dessalines · 2023-06-26T15:41:03Z

@tbe I'll comment on that issue.

phiresky · 2023-07-03T19:34:05Z

i want to mention this query plan from lemmy.world for GetComments:
As you can see it does a full seq scan on comments and sometimes even posts. Adding a limit of 300 would fix that

                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76852.137..110888.431 rows=710209 loops=1)
   ->  Gather Merge  (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76756.348..110724.832 rows=710209 loops=1)
         Workers Planned: 3
         Workers Launched: 0
         ->  Sort  (cost=158533.39..158725.99 rows=77042 width=3420) (actual time=76753.723..110541.803 rows=710209 loops=1)
               Sort Key: (subpath(comment.path, 0, '-1'::integer)), comment_aggregates.hot_rank DESC
               Sort Method: external merge  Disk: 2011568kB
               ->  Hash Left Join  (cost=54341.29..85804.78 rows=77042 width=3420) (actual time=16185.176..23083.199 rows=710209 loops=1)
                     Hash Cond: (comment.id = comment_like.comment_id)
                     ->  Hash Left Join  (cost=54180.22..85248.88 rows=77042 width=3386) (actual time=16185.121..22625.439 rows=710209 loops=1)
                           Hash Cond: (comment.creator_id = person_block.target_id)
                           ->  Hash Left Join  (cost=54176.74..85043.10 rows=77042 width=3366) (actual time=16185.072..22270.849 rows=710209 loops=1)
                                 Hash Cond: ((community.id = community_person_ban.community_id) AND (comment.creator_id = community_person_ban.person_id))
                                 ->  Parallel Hash Join  (cost=54170.99..84632.88 rows=77042 width=3338) (actual time=16184.949..21893.240 rows=710209 loops=1)
                                       Hash Cond: (post.community_id = community.id)
                                       Join Filter: ((NOT community.hidden) OR (community_follower.person_id = '-1'::integer))
                                       ->  Nested Loop  (cost=50896.35..81155.95 rows=77042 width=2214) (actual time=16132.831..21034.913 rows=710209 loops=1)
                                             ->  Parallel Hash Join  (cost=50895.92..67779.85 rows=77042 width=1274) (actual time=16132.760..19096.555 rows=710209 loops=1)
                                                   Hash Cond: (comment_aggregates.comment_id = comment.id)
                                                   ->  Parallel Seq Scan on comment_aggregates  (cost=0.00..15827.25 rows=231125 width=48) (actual time=0.063..151.783 rows=716681 loops=1)
                                                   ->  Parallel Hash  (cost=50159.82..50159.82 rows=58888 width=1226) (actual time=14781.457..14781.466 rows=710209 loops=1)
                                                         Buckets: 131072 (originally 262144)  Batches: 8 (originally 1)  Memory Usage: 77696kB
                                                         ->  Hash Left Join  (cost=26.92..50159.82 rows=58888 width=1226) (actual time=0.226..3148.840 rows=710209 loops=1)
                                                               Hash Cond: (comment.id = comment_saved.comment_id)
                                                               ->  Hash Left Join  (cost=23.53..50001.84 rows=58888 width=1206) (actual time=0.171..2966.432 rows=710209 loops=1)
                                                                     Hash Cond: (post.community_id = community_follower.community_id)
                                                                     ->  Nested Loop  (cost=0.43..49823.58 rows=58888 width=1185) (actual time=0.129..2772.617 rows=710209 loops=1)
                                                                           ->  Parallel Seq Scan on comment  (cost=0.00..34704.93 rows=58888 width=304) (actual time=0.048..438.164 rows=710209 loops=1)
                                                                                 Filter: (nlevel(path) <= 9)
                                                                                 Rows Removed by Filter: 6472
                                                                           ->  Memoize  (cost=0.43..0.52 rows=1 width=881) (actual time=0.003..0.003 rows=1 loops=710209)
                                                                                 Cache Key: comment.post_id
                                                                                 Cache Mode: logical
                                                                                 Hits: 630557  Misses: 79652  Evictions: 0  Overflows: 0  Memory Usage: 45666kB
                                                                                 ->  Index Scan using post_pkey on post  (cost=0.42..0.51 rows=1 width=881) (actual time=0.005..0.005 rows=1 loops=79652)
                                                                                       Index Cond: (id = comment.post_id)
                                                                     ->  Hash  (cost=22.67..22.67 rows=34 width=21) (actual time=0.026..0.028 rows=0 loops=1)
                                                                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                                           ->  Index Scan using idx_community_follower_person on community_follower  (cost=0.43..22.67 rows=34 width=21) (actual time=0.026..0.026 rows=0 loops=1)
                                                                                 Index Cond: (person_id = '-1'::integer)
                                                               ->  Hash  (cost=3.37..3.37 rows=2 width=20) (actual time=0.032..0.033 rows=0 loops=1)
                                                                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                                     ->  Index Scan using idx_comment_saved_person on comment_saved  (cost=0.29..3.37 rows=2 width=20) (actual time=0.032..0.032 rows=0 loops=1)
                                                                           Index Cond: (person_id = '-1'::integer)
                                             ->  Memoize  (cost=0.43..0.55 rows=1 width=940) (actual time=0.002..0.002 rows=1 loops=710209)
                                                   Cache Key: comment.creator_id
                                                   Cache Mode: logical
                                                   Hits: 645209  Misses: 65000  Evictions: 0  Overflows: 0  Memory Usage: 52029kB
                                                   ->  Index Scan using person__pkey on person  (cost=0.42..0.54 rows=1 width=940) (actual time=0.008..0.008 rows=1 loops=65000)
                                                         Index Cond: (id = comment.creator_id)
                                       ->  Parallel Hash  (cost=3167.06..3167.06 rows=8606 width=1124) (actual time=51.733..51.734 rows=20180 loops=1)
                                             Buckets: 32768  Batches: 1  Memory Usage: 19232kB
                                             ->  Parallel Seq Scan on community  (cost=0.00..3167.06 rows=8606 width=1124) (actual time=0.019..21.085 rows=20180 loops=1)
                                 ->  Hash  (cost=3.50..3.50 rows=150 width=28) (actual time=0.091..0.092 rows=167 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 18kB
                                       ->  Seq Scan on community_person_ban  (cost=0.00..3.50 rows=150 width=28) (actual time=0.018..0.037 rows=167 loops=1)
                           ->  Hash  (cost=3.46..3.46 rows=2 width=20) (actual time=0.026..0.027 rows=0 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                 ->  Index Scan using idx_person_block_person on person_block  (cost=0.28..3.46 rows=2 width=20) (actual time=0.025..0.026 rows=0 loops=1)
                                       Index Cond: (person_id = '-1'::integer)
                     ->  Hash  (cost=158.73..158.73 rows=187 width=6) (actual time=0.035..0.036 rows=0 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           ->  Index Scan using idx_comment_like_person on comment_like  (cost=0.43..158.73 rows=187 width=6) (actual time=0.035..0.035 rows=0 loops=1)
                                 Index Cond: (person_id = '-1'::integer)
 Planning Time: 29.311 ms
 JIT:
   Functions: 92
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 9.333 ms, Inlining 0.000 ms, Optimization 6.144 ms, Emission 89.868 ms, Total 105.345 ms
 Execution Time: 112037.707 ms
(70 rows)

tbe · 2023-07-03T20:28:30Z

Can you provide me with the contents of `pg_stats` and pg_stat_user_tables ? I struggle to reproduce these seq scans with generated data. Having the stats for database where this issue occurs may help a lot :) Am 3. Juli 2023 21:34:16 MESZ schrieb phiresky ***@***.***>:

…

i want to mention this query plan from lemmy.world: As you can see it does a full seq scan on comments and sometimes even posts. ``` QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76852.137..110888.431 rows=710209 loops=1) -> Gather Merge (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76756.348..110724.832 rows=710209 loops=1) Workers Planned: 3 Workers Launched: 0 -> Sort (cost=158533.39..158725.99 rows=77042 width=3420) (actual time=76753.723..110541.803 rows=710209 loops=1) Sort Key: (subpath(comment.path, 0, '-1'::integer)), comment_aggregates.hot_rank DESC Sort Method: external merge Disk: 2011568kB -> Hash Left Join (cost=54341.29..85804.78 rows=77042 width=3420) (actual time=16185.176..23083.199 rows=710209 loops=1) Hash Cond: (comment.id = comment_like.comment_id) -> Hash Left Join (cost=54180.22..85248.88 rows=77042 width=3386) (actual time=16185.121..22625.439 rows=710209 loops=1) Hash Cond: (comment.creator_id = person_block.target_id) -> Hash Left Join (cost=54176.74..85043.10 rows=77042 width=3366) (actual time=16185.072..22270.849 rows=710209 loops=1) Hash Cond: ((community.id = community_person_ban.community_id) AND (comment.creator_id = community_person_ban.person_id)) -> Parallel Hash Join (cost=54170.99..84632.88 rows=77042 width=3338) (actual time=16184.949..21893.240 rows=710209 loops=1) Hash Cond: (post.community_id = community.id) Join Filter: ((NOT community.hidden) OR (community_follower.person_id = '-1'::integer)) -> Nested Loop (cost=50896.35..81155.95 rows=77042 width=2214) (actual time=16132.831..21034.913 rows=710209 loops=1) -> Parallel Hash Join (cost=50895.92..67779.85 rows=77042 width=1274) (actual time=16132.760..19096.555 rows=710209 loops=1) Hash Cond: (comment_aggregates.comment_id = comment.id) -> Parallel Seq Scan on comment_aggregates (cost=0.00..15827.25 rows=231125 width=48) (actual time=0.063..151.783 rows=716681 loops=1) -> Parallel Hash (cost=50159.82..50159.82 rows=58888 width=1226) (actual time=14781.457..14781.466 rows=710209 loops=1) Buckets: 131072 (originally 262144) Batches: 8 (originally 1) Memory Usage: 77696kB -> Hash Left Join (cost=26.92..50159.82 rows=58888 width=1226) (actual time=0.226..3148.840 rows=710209 loops=1) Hash Cond: (comment.id = comment_saved.comment_id) -> Hash Left Join (cost=23.53..50001.84 rows=58888 width=1206) (actual time=0.171..2966.432 rows=710209 loops=1) Hash Cond: (post.community_id = community_follower.community_id) -> Nested Loop (cost=0.43..49823.58 rows=58888 width=1185) (actual time=0.129..2772.617 rows=710209 loops=1) -> Parallel Seq Scan on comment (cost=0.00..34704.93 rows=58888 width=304) (actual time=0.048..438.164 rows=710209 loops=1) Filter: (nlevel(path) <= 9) Rows Removed by Filter: 6472 -> Memoize (cost=0.43..0.52 rows=1 width=881) (actual time=0.003..0.003 rows=1 loops=710209) Cache Key: comment.post_id Cache Mode: logical Hits: 630557 Misses: 79652 Evictions: 0 Overflows: 0 Memory Usage: 45666kB -> Index Scan using post_pkey on post (cost=0.42..0.51 rows=1 width=881) (actual time=0.005..0.005 rows=1 loops=79652) Index Cond: (id = comment.post_id) -> Hash (cost=22.67..22.67 rows=34 width=21) (actual time=0.026..0.028 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Scan using idx_community_follower_person on community_follower (cost=0.43..22.67 rows=34 width=21) (actual time=0.026..0.026 rows=0 loops=1) Index Cond: (person_id = '-1'::integer) -> Hash (cost=3.37..3.37 rows=2 width=20) (actual time=0.032..0.033 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Scan using idx_comment_saved_person on comment_saved (cost=0.29..3.37 rows=2 width=20) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (person_id = '-1'::integer) -> Memoize (cost=0.43..0.55 rows=1 width=940) (actual time=0.002..0.002 rows=1 loops=710209) Cache Key: comment.creator_id Cache Mode: logical Hits: 645209 Misses: 65000 Evictions: 0 Overflows: 0 Memory Usage: 52029kB -> Index Scan using person__pkey on person (cost=0.42..0.54 rows=1 width=940) (actual time=0.008..0.008 rows=1 loops=65000) Index Cond: (id = comment.creator_id) -> Parallel Hash (cost=3167.06..3167.06 rows=8606 width=1124) (actual time=51.733..51.734 rows=20180 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 19232kB -> Parallel Seq Scan on community (cost=0.00..3167.06 rows=8606 width=1124) (actual time=0.019..21.085 rows=20180 loops=1) -> Hash (cost=3.50..3.50 rows=150 width=28) (actual time=0.091..0.092 rows=167 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 18kB -> Seq Scan on community_person_ban (cost=0.00..3.50 rows=150 width=28) (actual time=0.018..0.037 rows=167 loops=1) -> Hash (cost=3.46..3.46 rows=2 width=20) (actual time=0.026..0.027 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Scan using idx_person_block_person on person_block (cost=0.28..3.46 rows=2 width=20) (actual time=0.025..0.026 rows=0 loops=1) Index Cond: (person_id = '-1'::integer) -> Hash (cost=158.73..158.73 rows=187 width=6) (actual time=0.035..0.036 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Scan using idx_comment_like_person on comment_like (cost=0.43..158.73 rows=187 width=6) (actual time=0.035..0.035 rows=0 loops=1) Index Cond: (person_id = '-1'::integer) Planning Time: 29.311 ms JIT: Functions: 92 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 9.333 ms, Inlining 0.000 ms, Optimization 6.144 ms, Emission 89.868 ms, Total 105.345 ms Execution Time: 112037.707 ms (70 rows) ``` -- Reply to this email directly or view it on GitHub: #3306 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

phiresky · 2023-07-03T21:13:01Z

ah the above query is just for LIMIT i64::MAX. it's natural it will do a sequence scan there since it's literally fetching all comments from the database. it's not a query planning issue in this case, just an issue cause the limit is set badly (as fixed in this pr)

dessalines · 2023-07-03T21:32:57Z

Note that this hack pretty much sets a hard limit on the number of comments in a post. But its necessary until we have a good solution for tree paging.

phiresky · 2023-07-28T20:56:57Z

Probably not, this problem is with fetching comments, not updating them.

300 comment limit.

1405e7e

dessalines requested a review from Nutomic as a code owner June 23, 2023 20:46

Another fix.

0d71e28

This comment was marked as abuse.

Sign in to view

tbe mentioned this pull request Jun 24, 2023

[SQL] Optimize the database statements #3312

Closed

4 tasks

This comment was marked as abuse.

Sign in to view

Merge branch 'main' into limit_comments_1

0f0f204

dessalines enabled auto-merge (squash) July 3, 2023 21:30

Nutomic approved these changes Jul 3, 2023

View reviewed changes

dessalines merged commit f0e487f into main Jul 3, 2023

lionirdeadman mentioned this pull request Jul 20, 2023

[Bug]: Pagination doesn't work for GetComments if max_depth is set #3585

Closed

4 tasks

This comment was marked as abuse.

Sign in to view

lionirdeadman mentioned this pull request Aug 7, 2023

Show All Replies for Easy Searching LemmyNet/lemmy-ui#2043

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

300 comment limit. #3306

300 comment limit. #3306

dessalines commented Jun 23, 2023

SleeplessOne1917 commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

tbe commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

dessalines commented Jun 26, 2023

phiresky commented Jul 3, 2023 •

edited

Loading

This comment was marked as abuse.

tbe commented Jul 3, 2023 via email

phiresky commented Jul 3, 2023 •

edited

Loading

dessalines commented Jul 3, 2023

This comment was marked as abuse.

phiresky commented Jul 28, 2023

This comment was marked as abuse.

300 comment limit. #3306

300 comment limit. #3306

Conversation

dessalines commented Jun 23, 2023

SleeplessOne1917 commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

tbe commented Jun 23, 2023

This comment was marked as abuse.

tbe commented Jun 23, 2023

dessalines commented Jun 26, 2023

phiresky commented Jul 3, 2023 • edited Loading

This comment was marked as abuse.

tbe commented Jul 3, 2023 via email

phiresky commented Jul 3, 2023 • edited Loading

dessalines commented Jul 3, 2023

This comment was marked as abuse.

phiresky commented Jul 28, 2023

This comment was marked as abuse.

phiresky commented Jul 3, 2023 •

edited

Loading

phiresky commented Jul 3, 2023 •

edited

Loading