-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML5 parsing and error checking #362
Conversation
Wow, I like a lot of this. I especially like the removal of the Nokogiri hack method--thank you for that! I'm a bit slammed at work at the moment so I won't be able to get this for a few days. For this task item:
I think we should just switch to HTML5 checking. I'd rather documents be modern and correct then continue to support separate HTML4/5 checks. |
0473f47
to
0a55dd4
Compare
ping? |
I like a lot of the clean-up here, thank you very much! I am a bit concerned at some of the test changes, primarily where items were failing tests and have now flipped to not fail. Could you explain why that is? Is it because the HTML5 parsing is laxer than the previous HTML4 parsing? |
Precisely! Reflects where HTML5 rules differ from XHTML1/HTML4. |
missingLinkHrefFilepath = "#{FIXTURES_DIR}/links/missingLinkHref.html" | ||
proofer = run_proofer(missingLinkHrefFilepath, :file) | ||
expect(proofer.failed_tests.first).to match(/anchor has no href attribute/) | ||
expect(proofer.failed_tests).to eq [] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTML5 allows links with no href
attribute.
head_link = "#{FIXTURES_DIR}/links/head_link_href_absent.html" | ||
proofer = run_proofer(head_link, :file) | ||
expect(proofer.failed_tests.first).to match(/anchor has no href attribute/) | ||
expect(proofer.failed_tests).to eq [] | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. HTML5 allows links with no href
attribute.
@jeremy Would you mind merging master to fix the merge conflict? After that I think this should be good to go. Thank you! |
@gjtorikian Rebased. |
I completely forgot about this! Sorry! |
Unfortunately I still can't accept this just yet. |
@jeremy The issue is fixed upstream and now Nokogumbo should be providing the line numbers that you need for this fork to be successful. Would you be able to continue working on this PR? Is there any assistance you require? |
Nokogumbo doesn't include a gemspec, so we can't bundle it from git, and the fix isn't included in the recent 2.0.0.pre.alpha release. That means we can't depend on a fixed version to test this PR, so I branched nokogumbo (rubys/nokogumbo#104) to provide a non-Rakefile-embedded gemspec for temporary use. |
Note that a bunch of line numbers are still incorrect Another route is to support what we can and defer to the underlying library on specifics, allowing users to parse HTML5 today despite the incomplete support. |
Introduces `--report-missing-doctype`, defaulting to false, to report on missing `<!DOCTYPE html>` at the beginning of the document. Depends on Nokogumbo >= 1.4.10 for rubys/nokogumbo#46 Fixes gjtorikian#318.
Switched nokogumbo dep to track git master pending 2.0.0 release. |
I filed a PR with Nokogiri to add a line setting API (referenced above) and I added rubys/nokogumbo#122 to track the issue in Nokogumbo. |
Thank you for providing the update. May rubys/nokogumbo#122 please be referenced as a blocker at top? |
We have a path forward with rubys/nokogumbo#130 |
In Nokogumbo we are waiting for rubys/nokogumbo#130 so that we can use it here. |
Fixed merge conflicts. And still have a few failing tests. |
Requesting help here to fix unit tests and review this PR |
For some reason I cannot push to this PR? But here is the diff necessary to fix the tests: diff --git a/lib/html-proofer/middleware.rb b/lib/html-proofer/middleware.rb
index b1e7eab..a154261 100644
--- a/lib/html-proofer/middleware.rb
+++ b/lib/html-proofer/middleware.rb
@@ -67,7 +67,7 @@ module HTMLProofer
'response',
Middleware.options
).check_parsed(
- Nokogiri::HTML(Utils.clean_content(html)), 'response'
+ Nokogiri::HTML5(html), 'response'
)
if parsed[:failures].length > 0 |
@gjtorikian I was able to put that in here jeremy@35f85cb |
@fulldecent Do you have push access to his repo? Normally I can make commits directly to PRs but I can't seem to manage it with this one. Looks like there's just one failure left. |
@gjtorikian Yes, but I can only do it with the web interface. Here is the link I'm using: |
@gjtorikian does this work for you now? |
I do believe so, but I haven't had a chance to look into it a fix as yet. |
I believe that should do it. I'll take a look at the merge conflicts as soon as I am able. |
Thanks everyone. This is looking ready to go. I will merge, release, and write release notes as soon as possible--I only just had a quick ten minutes right now to do all the merge conflict fixes. |
Codecov Report
@@ Coverage Diff @@
## master #362 +/- ##
=========================================
- Coverage 98.41% 98.2% -0.21%
=========================================
Files 30 30
Lines 1951 1949 -2
=========================================
- Hits 1920 1914 -6
- Misses 31 35 +4
Continue to review full report at Codecov.
|
Thanks for taking this to the finish line @gjtorikian!! 🎉 |
Switches from Nokogiri to Nokogumbo. Depends on Nokogumbo >= 1.4.10 for error reporting.
Introduces
--report-missing-doctype
, defaulting to false, to report on missing<!DOCTYPE html>
at the beginning of the document.Fixes #318
Pending:
Nokogiri::HTML5
does not support line numbers rubys/nokogumbo#53)