Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent server crashes on html5ever tokenizer #1964

Closed
dessalines opened this issue Nov 30, 2021 · 9 comments
Closed

Intermittent server crashes on html5ever tokenizer #1964

dessalines opened this issue Nov 30, 2021 · 9 comments
Labels
bug Something isn't working

Comments

@dessalines
Copy link
Member

The server will time out, and you'll only see this repeating forever in the logs:

lemmy_1     |   2021-11-30T23:27:14.390776Z DEBUG html5ever::tokenizer: got character �
lemmy_1     |     at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.22.5/src/tokenizer/mod.rs:273
@dessalines dessalines added the bug Something isn't working label Nov 30, 2021
@dessalines
Copy link
Member Author

dessalines commented Dec 1, 2021

It wasn't a segfault, but it still locks the server into an infinite loop.

I'm fairly certain the main offender is the retry fetch function, which will get stuck in an infinite loop unless request_counter is checked. The html5ever errors are just what the log will constantly repeat, with either the parse_html function, or html_to_site_metadata. Here's one where it isn't: https://github.com/LemmyNet/lemmy/blob/main/crates/apub/src/fetcher/webfinger.rs#L64

The solution is to:

  • Remove all retry where there is no request counter
  • Where there is, make sure it adds a check like here.

dessalines added a commit that referenced this issue Dec 1, 2021
@dessalines
Copy link
Member Author

dessalines commented Dec 5, 2021

We haven't deployed the fix yet, but lemmy is timing out again on these errors:

lemmy_1     |   2021-12-05T14:27:09.693014Z DEBUG html5ever::tokenizer: got character �
lemmy_1     |     at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.22.5/src/tokenizer/mod.rs:273
lemmy_1     |     in tracing_actix_web::root_span_builder::HTTP request with , http.method: POST, http.route: /inbox, http.flavor: 1.0, http.sche
me: http, http.host: lemmy.ml, http.client_ip: xxx, http.user_agent: , http.target: /inbox, otel.kind: "server", request_id: 129e3efd
-a149-489a-8026-3832eba2aff4

or

lemmy_1     |   2021-12-05T14:56:38.539147Z DEBUG html5ever::tokenizer::char_ref: char ref tokenizer stepping in state Named
lemmy_1     |     at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.22.5/src/tokenizer/char_ref/mod.rs:123
lemmy_1     |     in tracing_actix_web::root_span_builder::HTTP request with , http.method: GET, http.route: /api/v3/post/site_metadata, http.flavor: 1.1, http.scheme: http, http.host: lemmy.ml, http.client_ip: xxx, http.user_agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.0.0 Safari/537.36, http.target: /api/v3/post/site_metadata?url=https%3A%2F%2Fgithub.com%2FIldaron%2FEEGwithRaspberryPI, otel.kind: "server", request_id: 576d78c0-13bb-4767-8530-3a8f890c3bac

I think this is the issue. The call of webpage::HTML::from_string .

Its fetch is being called from here. And the from_apub method has somewhere up the chain where its recalling in an infinite loop.

This is pretty urgent so I'll merge, then do a cherry-pick to the release/0.14 branch to get this on the serve asap.

dessalines added a commit that referenced this issue Dec 5, 2021
* Fix retry infinite loops. Fixes #1964

* Moving retry_limit to settings
@dessalines dessalines reopened this Dec 5, 2021
@dessalines
Copy link
Member Author

Reopening in case this PR didn't fix.

@dessalines
Copy link
Member Author

Ugh, now its getting errors on the other one.

lemmy_1     |   2021-12-05T16:06:34.910521Z DEBUG html5ever::tree_builder: processing CharacterTokens(NotSplit, Tendril<UTF8>(inline: \"\\u{1b}\")) in insertion mode InBody
lemmy_1     |     at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.25.1/src/tree_builder/mod.rs:312

@dessalines
Copy link
Member Author

Bah, this bug is still occurring, and its quite critical as it locks the server while repeated fetches are done:

lemmy_1 | 2021-12-12T03:26:39.022650Z DEBUG html5ever::tokenizer: processing in state AttributeName lemmy_1 | at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.22.5/src/tokenizer/mod.rs:677 lemmy_1 | in tracing_actix_web::root_span_builder::HTTP request with , http.method: GET, http.route: /api/v3/post/list, http.flavor: 1.1, http.scheme: http, http.host: lemmy.ml, http.client_ip: 87.250.224.190, http.user_agent: node-fetch/1.0 (+https://github.com/bitinn/node-fetch), http.target: /api/v3/post/list?page=1&limit=20&sort=Active&type_=Community&saved_only=false&community_name=genzedong%40lemmygrad.ml

Nutomic pushed a commit that referenced this issue Dec 12, 2021
* Trying out an upgraded version of html5ever. #1964

* New release of webpage.
dessalines added a commit that referenced this issue Dec 13, 2021
* Trying out an upgraded version of html5ever. #1964

* New release of webpage.
@dessalines
Copy link
Member Author

The upgraded version of webpage is now deployed to prod as 0.14.6-rc.1. I'll keep an eye on it and see if this fixes the timing out issue. This is one of our 2 critical bugs right now, the other being background jobs.

@dessalines
Copy link
Member Author

dessalines commented Dec 14, 2021

Didn't fix :(

It could be in this from_apub function, I noticed its happening only on routes like /api/v3/post/list...community_name=genzedong%40lemmygrad.ml or api/v3/community/list...community_name=name@lemmygrad.ml . Its pretty difficult to trace down where this is happening.

Possible line: https://github.com/LemmyNet/lemmy/blob/main/crates/apub/src/objects/post.rs#L167

dessalines added a commit that referenced this issue Dec 20, 2021
dessalines added a commit that referenced this issue Dec 21, 2021
dessalines added a commit that referenced this issue Dec 21, 2021
@dessalines
Copy link
Member Author

This is deployed now, 0.14.6-rc.2, and it seems to have fixed the issue. I think I also found the URL the spammer was using to lock up lemmy.

@dessalines
Copy link
Member Author

Its probable that #2015 will fix this also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants