Skip to content

Commit

Permalink
Switch linkchecker to use html5ever for html parsing.
Browse files Browse the repository at this point in the history
The existing regex-based HTML parsing was just too primitive to
correctly handle HTML content. Some books have legitimate `href="…"`
text which should not be validated because it is part of the text, not
actual HTML.
  • Loading branch information
ehuss committed Feb 8, 2024
1 parent bf6a1b1 commit 776590b
Show file tree
Hide file tree
Showing 3 changed files with 233 additions and 194 deletions.
1 change: 1 addition & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2274,6 +2274,7 @@ dependencies = [
name = "linkchecker"
version = "0.1.0"
dependencies = [
"html5ever",
"once_cell",
"regex",
]
Expand Down
1 change: 1 addition & 0 deletions src/tools/linkchecker/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ path = "main.rs"
[dependencies]
regex = "1"
once_cell = "1"
html5ever = "0.26.0"
Loading

0 comments on commit 776590b

Please sign in to comment.