You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Summary
Lemmy tries to extract OpenGraph metadata from URLs referenced in posts. If the post URL is a direct link to a large binary file, it still downloads the whole file, removes all non-utf8 characters and runs a HTML parser on it:
Always only fetch the first 16kB of a URL, not the whole thing. i think this is common practice for metadata extraction but not 100% sure.
Check whether the returned data is binary. I would simply check whether it contains at least one null byte (this is the method that ripgrep uses to detect binary data as well). If it is binary, don't run the extraction.
The relevant code was restructured in #4035 but I'm not sure whether it existed before or not.
Requirements
Summary
Lemmy tries to extract OpenGraph metadata from URLs referenced in posts. If the post URL is a direct link to a large binary file, it still downloads the whole file, removes all non-utf8 characters and runs a HTML parser on it:
lemmy/crates/api_common/src/request.rs
Lines 45 to 60 in d09854a
lemmy/crates/api_common/src/request.rs
Lines 129 to 132 in d09854a
This is a very expensive call for large binary files
Steps to Reproduce
curl -v 'localhost:8536/api/v3/post/site_metadata?url=https://i.redd.it/tdnjprab04gd1.gif'
(warning: that's a 20MB NSFW gif)Technical Details
happens locally but Tiff (reddthat.com) observes this regularily in production
Version
0.19.5
Lemmy Instance URL
reddthat.com
The text was updated successfully, but these errors were encountered: