-
-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
300 comment limit. #3306
300 comment limit. #3306
Conversation
@Nutomic tagging you since I'm unable to approve this. @dessalines once this is merged, I'm looking into using this package. I think it makes sense to validate at out API endpoints moreso than it does on the DB view. |
This comment was marked as abuse.
This comment was marked as abuse.
I had a look at the PR, and i think this issue may result from offset and limit clauses, most likely paired with an order by somewhere in the generated statement. From the view of a database, this is a worsed case scenario. What the database does is:
This works well enough for smaller tables, but for large tables, this can be the death of an production database. I will try to reproduce this issue locally, and maybe i can come up with something better here. |
This comment was marked as abuse.
This comment was marked as abuse.
O dear, sorry i have to say that, but that is bad. Really bad. There is no index that would help here. The database schema MUST change for this. Better solution would be a column with parent ID's and then using a recursive CTE. I can provide an example schema, and a statement to do this. Maybe even a migration path, but first i have to understand how this works currently. |
OK, i had a look at the query mentioned by @RocketDerp . The column is of type ltree, so there is no issues with substring parsing or things like that. But: There is no index either, and for ordering, it is a bad idea for large sets. A recursive CTE would be far more efficient. The good news: As we have an ltree, we can adopt the schema fast, to just store the parent comment ID. This may have side effects, for example if a user deletes a comment, or his account, what happens? Is the parent comment preserved as a placeholder? @dessalines , can you provide this information? |
This comment was marked as abuse.
This comment was marked as abuse.
The more i think about this queries, the more my brain hurts. I will open a new issue to track this kind of problems. |
@tbe I'll comment on that issue. |
i want to mention this query plan from lemmy.world for GetComments:
|
This comment was marked as abuse.
This comment was marked as abuse.
Can you provide me with the contents of `pg_stats` and pg_stat_user_tables ?
I struggle to reproduce these seq scans with generated data.
Having the stats for database where this issue occurs may help a lot :)
Am 3. Juli 2023 21:34:16 MESZ schrieb phiresky ***@***.***>:
…i want to mention this query plan from lemmy.world:
As you can see it does a full seq scan on comments and sometimes even posts.
```
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76852.137..110888.431 rows=710209 loops=1)
-> Gather Merge (cost=159533.43..186883.34 rows=231126 width=3420) (actual time=76756.348..110724.832 rows=710209 loops=1)
Workers Planned: 3
Workers Launched: 0
-> Sort (cost=158533.39..158725.99 rows=77042 width=3420) (actual time=76753.723..110541.803 rows=710209 loops=1)
Sort Key: (subpath(comment.path, 0, '-1'::integer)), comment_aggregates.hot_rank DESC
Sort Method: external merge Disk: 2011568kB
-> Hash Left Join (cost=54341.29..85804.78 rows=77042 width=3420) (actual time=16185.176..23083.199 rows=710209 loops=1)
Hash Cond: (comment.id = comment_like.comment_id)
-> Hash Left Join (cost=54180.22..85248.88 rows=77042 width=3386) (actual time=16185.121..22625.439 rows=710209 loops=1)
Hash Cond: (comment.creator_id = person_block.target_id)
-> Hash Left Join (cost=54176.74..85043.10 rows=77042 width=3366) (actual time=16185.072..22270.849 rows=710209 loops=1)
Hash Cond: ((community.id = community_person_ban.community_id) AND (comment.creator_id = community_person_ban.person_id))
-> Parallel Hash Join (cost=54170.99..84632.88 rows=77042 width=3338) (actual time=16184.949..21893.240 rows=710209 loops=1)
Hash Cond: (post.community_id = community.id)
Join Filter: ((NOT community.hidden) OR (community_follower.person_id = '-1'::integer))
-> Nested Loop (cost=50896.35..81155.95 rows=77042 width=2214) (actual time=16132.831..21034.913 rows=710209 loops=1)
-> Parallel Hash Join (cost=50895.92..67779.85 rows=77042 width=1274) (actual time=16132.760..19096.555 rows=710209 loops=1)
Hash Cond: (comment_aggregates.comment_id = comment.id)
-> Parallel Seq Scan on comment_aggregates (cost=0.00..15827.25 rows=231125 width=48) (actual time=0.063..151.783 rows=716681 loops=1)
-> Parallel Hash (cost=50159.82..50159.82 rows=58888 width=1226) (actual time=14781.457..14781.466 rows=710209 loops=1)
Buckets: 131072 (originally 262144) Batches: 8 (originally 1) Memory Usage: 77696kB
-> Hash Left Join (cost=26.92..50159.82 rows=58888 width=1226) (actual time=0.226..3148.840 rows=710209 loops=1)
Hash Cond: (comment.id = comment_saved.comment_id)
-> Hash Left Join (cost=23.53..50001.84 rows=58888 width=1206) (actual time=0.171..2966.432 rows=710209 loops=1)
Hash Cond: (post.community_id = community_follower.community_id)
-> Nested Loop (cost=0.43..49823.58 rows=58888 width=1185) (actual time=0.129..2772.617 rows=710209 loops=1)
-> Parallel Seq Scan on comment (cost=0.00..34704.93 rows=58888 width=304) (actual time=0.048..438.164 rows=710209 loops=1)
Filter: (nlevel(path) <= 9)
Rows Removed by Filter: 6472
-> Memoize (cost=0.43..0.52 rows=1 width=881) (actual time=0.003..0.003 rows=1 loops=710209)
Cache Key: comment.post_id
Cache Mode: logical
Hits: 630557 Misses: 79652 Evictions: 0 Overflows: 0 Memory Usage: 45666kB
-> Index Scan using post_pkey on post (cost=0.42..0.51 rows=1 width=881) (actual time=0.005..0.005 rows=1 loops=79652)
Index Cond: (id = comment.post_id)
-> Hash (cost=22.67..22.67 rows=34 width=21) (actual time=0.026..0.028 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Index Scan using idx_community_follower_person on community_follower (cost=0.43..22.67 rows=34 width=21) (actual time=0.026..0.026 rows=0 loops=1)
Index Cond: (person_id = '-1'::integer)
-> Hash (cost=3.37..3.37 rows=2 width=20) (actual time=0.032..0.033 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Index Scan using idx_comment_saved_person on comment_saved (cost=0.29..3.37 rows=2 width=20) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: (person_id = '-1'::integer)
-> Memoize (cost=0.43..0.55 rows=1 width=940) (actual time=0.002..0.002 rows=1 loops=710209)
Cache Key: comment.creator_id
Cache Mode: logical
Hits: 645209 Misses: 65000 Evictions: 0 Overflows: 0 Memory Usage: 52029kB
-> Index Scan using person__pkey on person (cost=0.42..0.54 rows=1 width=940) (actual time=0.008..0.008 rows=1 loops=65000)
Index Cond: (id = comment.creator_id)
-> Parallel Hash (cost=3167.06..3167.06 rows=8606 width=1124) (actual time=51.733..51.734 rows=20180 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 19232kB
-> Parallel Seq Scan on community (cost=0.00..3167.06 rows=8606 width=1124) (actual time=0.019..21.085 rows=20180 loops=1)
-> Hash (cost=3.50..3.50 rows=150 width=28) (actual time=0.091..0.092 rows=167 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 18kB
-> Seq Scan on community_person_ban (cost=0.00..3.50 rows=150 width=28) (actual time=0.018..0.037 rows=167 loops=1)
-> Hash (cost=3.46..3.46 rows=2 width=20) (actual time=0.026..0.027 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Index Scan using idx_person_block_person on person_block (cost=0.28..3.46 rows=2 width=20) (actual time=0.025..0.026 rows=0 loops=1)
Index Cond: (person_id = '-1'::integer)
-> Hash (cost=158.73..158.73 rows=187 width=6) (actual time=0.035..0.036 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Index Scan using idx_comment_like_person on comment_like (cost=0.43..158.73 rows=187 width=6) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: (person_id = '-1'::integer)
Planning Time: 29.311 ms
JIT:
Functions: 92
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 9.333 ms, Inlining 0.000 ms, Optimization 6.144 ms, Emission 89.868 ms, Total 105.345 ms
Execution Time: 112037.707 ms
(70 rows)
```
--
Reply to this email directly or view it on GitHub:
#3306 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
ah the above query is just for LIMIT i64::MAX. it's natural it will do a sequence scan there since it's literally fetching all comments from the database. it's not a query planning issue in this case, just an issue cause the limit is set badly (as fixed in this pr) |
Note that this hack pretty much sets a hard limit on the number of comments in a post. But its necessary until we have a good solution for tree paging. |
This comment was marked as abuse.
This comment was marked as abuse.
Probably not, this problem is with fetching comments, not updating them. |
People are currently ddos'ing lemmy servers by attacking a vulnerability having to do with high-comment threads.