Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

300 comment limit. #3306

Merged
merged 3 commits into from
Jul 3, 2023
Merged

300 comment limit. #3306

merged 3 commits into from
Jul 3, 2023

Conversation

dessalines
Copy link
Member

People are currently ddos'ing lemmy servers by attacking a vulnerability having to do with high-comment threads.

@dessalines dessalines requested a review from Nutomic as a code owner June 23, 2023 20:46
@SleeplessOne1917
Copy link
Member

@Nutomic tagging you since I'm unable to approve this.

@dessalines once this is merged, I'm looking into using this package. I think it makes sense to validate at out API endpoints moreso than it does on the DB view.

@RocketDerp

This comment was marked as abuse.

@tbe
Copy link

tbe commented Jun 23, 2023

I had a look at the PR, and i think this issue may result from offset and limit clauses, most likely paired with an order by somewhere in the generated statement.

From the view of a database, this is a worsed case scenario. What the database does is:

  • Select ALL matching entries
  • Order them ALL (and if there is not enough memory, do it on disk)
  • Return only the entries that match limit and offset

This works well enough for smaller tables, but for large tables, this can be the death of an production database.

I will try to reproduce this issue locally, and maybe i can come up with something better here.

@RocketDerp

This comment was marked as abuse.

@tbe
Copy link

tbe commented Jun 23, 2023

ORDER BY subpath("comment"."path", $10, $11)

O dear, sorry i have to say that, but that is bad. Really bad. There is no index that would help here.

The database schema MUST change for this. Better solution would be a column with parent ID's and then using a recursive CTE.

I can provide an example schema, and a statement to do this. Maybe even a migration path, but first i have to understand how this works currently.

@tbe
Copy link

tbe commented Jun 23, 2023

OK, i had a look at the query mentioned by @RocketDerp . The column is of type ltree, so there is no issues with substring parsing or things like that.

But: There is no index either, and for ordering, it is a bad idea for large sets. A recursive CTE would be far more efficient.

The good news: As we have an ltree, we can adopt the schema fast, to just store the parent comment ID. This may have side effects, for example if a user deletes a comment, or his account, what happens? Is the parent comment preserved as a placeholder?

@dessalines , can you provide this information?

@RocketDerp

This comment was marked as abuse.

@tbe
Copy link

tbe commented Jun 23, 2023

The more i think about this queries, the more my brain hurts. I will open a new issue to track this kind of problems.

@tbe tbe mentioned this pull request Jun 24, 2023
4 tasks
@dessalines
Copy link
Member Author

@tbe I'll comment on that issue.

@phiresky
Copy link
Collaborator

phiresky commented Jul 3, 2023

i want to mention this query plan from lemmy.world for GetComments:
As you can see it does a full seq scan on comments and sometimes even posts. Adding a limit of 300 would fix that

                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76852.137..110888.431 rows=710209 loops=1)
   ->  Gather Merge  (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76756.348..110724.832 rows=710209 loops=1)
         Workers Planned: 3
         Workers Launched: 0
         ->  Sort  (cost=158533.39..158725.99 rows=77042 width=3420) (actual time=76753.723..110541.803 rows=710209 loops=1)
               Sort Key: (subpath(comment.path, 0, '-1'::integer)), comment_aggregates.hot_rank DESC
               Sort Method: external merge  Disk: 2011568kB
               ->  Hash Left Join  (cost=54341.29..85804.78 rows=77042 width=3420) (actual time=16185.176..23083.199 rows=710209 loops=1)
                     Hash Cond: (comment.id = comment_like.comment_id)
                     ->  Hash Left Join  (cost=54180.22..85248.88 rows=77042 width=3386) (actual time=16185.121..22625.439 rows=710209 loops=1)
                           Hash Cond: (comment.creator_id = person_block.target_id)
                           ->  Hash Left Join  (cost=54176.74..85043.10 rows=77042 width=3366) (actual time=16185.072..22270.849 rows=710209 loops=1)
                                 Hash Cond: ((community.id = community_person_ban.community_id) AND (comment.creator_id = community_person_ban.person_id))
                                 ->  Parallel Hash Join  (cost=54170.99..84632.88 rows=77042 width=3338) (actual time=16184.949..21893.240 rows=710209 loops=1)
                                       Hash Cond: (post.community_id = community.id)
                                       Join Filter: ((NOT community.hidden) OR (community_follower.person_id = '-1'::integer))
                                       ->  Nested Loop  (cost=50896.35..81155.95 rows=77042 width=2214) (actual time=16132.831..21034.913 rows=710209 loops=1)
                                             ->  Parallel Hash Join  (cost=50895.92..67779.85 rows=77042 width=1274) (actual time=16132.760..19096.555 rows=710209 loops=1)
                                                   Hash Cond: (comment_aggregates.comment_id = comment.id)
                                                   ->  Parallel Seq Scan on comment_aggregates  (cost=0.00..15827.25 rows=231125 width=48) (actual time=0.063..151.783 rows=716681 loops=1)
                                                   ->  Parallel Hash  (cost=50159.82..50159.82 rows=58888 width=1226) (actual time=14781.457..14781.466 rows=710209 loops=1)
                                                         Buckets: 131072 (originally 262144)  Batches: 8 (originally 1)  Memory Usage: 77696kB
                                                         ->  Hash Left Join  (cost=26.92..50159.82 rows=58888 width=1226) (actual time=0.226..3148.840 rows=710209 loops=1)
                                                               Hash Cond: (comment.id = comment_saved.comment_id)
                                                               ->  Hash Left Join  (cost=23.53..50001.84 rows=58888 width=1206) (actual time=0.171..2966.432 rows=710209 loops=1)
                                                                     Hash Cond: (post.community_id = community_follower.community_id)
                                                                     ->  Nested Loop  (cost=0.43..49823.58 rows=58888 width=1185) (actual time=0.129..2772.617 rows=710209 loops=1)
                                                                           ->  Parallel Seq Scan on comment  (cost=0.00..34704.93 rows=58888 width=304) (actual time=0.048..438.164 rows=710209 loops=1)
                                                                                 Filter: (nlevel(path) <= 9)
                                                                                 Rows Removed by Filter: 6472
                                                                           ->  Memoize  (cost=0.43..0.52 rows=1 width=881) (actual time=0.003..0.003 rows=1 loops=710209)
                                                                                 Cache Key: comment.post_id
                                                                                 Cache Mode: logical
                                                                                 Hits: 630557  Misses: 79652  Evictions: 0  Overflows: 0  Memory Usage: 45666kB
                                                                                 ->  Index Scan using post_pkey on post  (cost=0.42..0.51 rows=1 width=881) (actual time=0.005..0.005 rows=1 loops=79652)
                                                                                       Index Cond: (id = comment.post_id)
                                                                     ->  Hash  (cost=22.67..22.67 rows=34 width=21) (actual time=0.026..0.028 rows=0 loops=1)
                                                                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                                           ->  Index Scan using idx_community_follower_person on community_follower  (cost=0.43..22.67 rows=34 width=21) (actual time=0.026..0.026 rows=0 loops=1)
                                                                                 Index Cond: (person_id = '-1'::integer)
                                                               ->  Hash  (cost=3.37..3.37 rows=2 width=20) (actual time=0.032..0.033 rows=0 loops=1)
                                                                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                                     ->  Index Scan using idx_comment_saved_person on comment_saved  (cost=0.29..3.37 rows=2 width=20) (actual time=0.032..0.032 rows=0 loops=1)
                                                                           Index Cond: (person_id = '-1'::integer)
                                             ->  Memoize  (cost=0.43..0.55 rows=1 width=940) (actual time=0.002..0.002 rows=1 loops=710209)
                                                   Cache Key: comment.creator_id
                                                   Cache Mode: logical
                                                   Hits: 645209  Misses: 65000  Evictions: 0  Overflows: 0  Memory Usage: 52029kB
                                                   ->  Index Scan using person__pkey on person  (cost=0.42..0.54 rows=1 width=940) (actual time=0.008..0.008 rows=1 loops=65000)
                                                         Index Cond: (id = comment.creator_id)
                                       ->  Parallel Hash  (cost=3167.06..3167.06 rows=8606 width=1124) (actual time=51.733..51.734 rows=20180 loops=1)
                                             Buckets: 32768  Batches: 1  Memory Usage: 19232kB
                                             ->  Parallel Seq Scan on community  (cost=0.00..3167.06 rows=8606 width=1124) (actual time=0.019..21.085 rows=20180 loops=1)
                                 ->  Hash  (cost=3.50..3.50 rows=150 width=28) (actual time=0.091..0.092 rows=167 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 18kB
                                       ->  Seq Scan on community_person_ban  (cost=0.00..3.50 rows=150 width=28) (actual time=0.018..0.037 rows=167 loops=1)
                           ->  Hash  (cost=3.46..3.46 rows=2 width=20) (actual time=0.026..0.027 rows=0 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                 ->  Index Scan using idx_person_block_person on person_block  (cost=0.28..3.46 rows=2 width=20) (actual time=0.025..0.026 rows=0 loops=1)
                                       Index Cond: (person_id = '-1'::integer)
                     ->  Hash  (cost=158.73..158.73 rows=187 width=6) (actual time=0.035..0.036 rows=0 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           ->  Index Scan using idx_comment_like_person on comment_like  (cost=0.43..158.73 rows=187 width=6) (actual time=0.035..0.035 rows=0 loops=1)
                                 Index Cond: (person_id = '-1'::integer)
 Planning Time: 29.311 ms
 JIT:
   Functions: 92
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 9.333 ms, Inlining 0.000 ms, Optimization 6.144 ms, Emission 89.868 ms, Total 105.345 ms
 Execution Time: 112037.707 ms
(70 rows)

@RocketDerp

This comment was marked as abuse.

@tbe
Copy link

tbe commented Jul 3, 2023 via email

@phiresky
Copy link
Collaborator

phiresky commented Jul 3, 2023

ah the above query is just for LIMIT i64::MAX. it's natural it will do a sequence scan there since it's literally fetching all comments from the database. it's not a query planning issue in this case, just an issue cause the limit is set badly (as fixed in this pr)

@dessalines dessalines enabled auto-merge (squash) July 3, 2023 21:30
@dessalines
Copy link
Member Author

Note that this hack pretty much sets a hard limit on the number of comments in a post. But its necessary until we have a good solution for tree paging.

@RocketDerp

This comment was marked as abuse.

@phiresky
Copy link
Collaborator

Probably not, this problem is with fetching comments, not updating them.

@RocketDerp

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants