Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scriptdata bad testcase #181

Merged
merged 2 commits into from
Oct 29, 2024
Merged

scriptdata bad testcase #181

merged 2 commits into from
Oct 29, 2024

Conversation

untitaker
Copy link
Owner

No description provided.

@untitaker
Copy link
Owner Author

seems that lychee handles those cases by excluding the script tag from scraping, but that's actually a bug in html5gum. the contents of script tags shouldn't be interpreted as HTML at all.

https://github.com/lycheeverse/lychee/blob/eff77d634283610dbdd481db2d9b1d1162cdea84/lychee-lib/src/extract/html/mod.rs#L44

cc @mre

@untitaker
Copy link
Owner Author

untitaker/html5gum#98

@mre
Copy link

mre commented Oct 29, 2024

Thanks for the info. Could you remind me once this is merged or maybe even update the code in lychee? 🙃

@untitaker
Copy link
Owner Author

sure

@untitaker
Copy link
Owner Author

html5gum 0.6.1 is out, and can be bumped in lychee. I guess dependabot will do it, there shouldn't be any code changes required. You may want to check out CallbackEmitter if you want to simplify some code, though it has a bit of overhead (still way faster than most other things)

For some reason it's not possible to hit this bug in lychee. I don't know why, but the code I linked earlier is not related. I guess it doesn't matter.

@untitaker untitaker marked this pull request as ready for review October 29, 2024 21:33
@untitaker untitaker merged commit 055a110 into main Oct 29, 2024
11 checks passed
@untitaker untitaker deleted the json-script branch October 29, 2024 21:33
@mre
Copy link

mre commented Oct 29, 2024

Cool. Let's wait until the update comes through, then. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants