Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5 parsing and error checking #362

Merged
merged 11 commits into from
Dec 13, 2019
Merged

HTML5 parsing and error checking #362

merged 11 commits into from
Dec 13, 2019

Conversation

jeremy
Copy link
Contributor

@jeremy jeremy commented Nov 1, 2016

Switches from Nokogiri to Nokogumbo. Depends on Nokogumbo >= 1.4.10 for error reporting.

Introduces --report-missing-doctype, defaulting to false, to report on missing <!DOCTYPE html> at the beginning of the document.

Fixes #318

Pending:

@gjtorikian
Copy link
Owner

Wow, I like a lot of this. I especially like the removal of the Nokogiri hack method--thank you for that!

I'm a bit slammed at work at the moment so I won't be able to get this for a few days. For this task item:

whether to switch to HTML5 checking or introduce it as a new check

I think we should just switch to HTML5 checking. I'd rather documents be modern and correct then continue to support separate HTML4/5 checks.

@matkoniecz
Copy link
Contributor

I'm a bit slammed at work at the moment so I won't be able to get this for a few days.

ping?

Gemfile Show resolved Hide resolved
html-proofer.gemspec Outdated Show resolved Hide resolved
lib/html-proofer/check/html.rb Show resolved Hide resolved
spec/html-proofer/html_spec.rb Outdated Show resolved Hide resolved
@gjtorikian
Copy link
Owner

I like a lot of the clean-up here, thank you very much!

I am a bit concerned at some of the test changes, primarily where items were failing tests and have now flipped to not fail. Could you explain why that is? Is it because the HTML5 parsing is laxer than the previous HTML4 parsing?

@jeremy
Copy link
Contributor Author

jeremy commented Feb 4, 2017

Could you explain why that is? Is it because the HTML5 parsing is laxer than the previous HTML4 parsing?

Precisely! Reflects where HTML5 rules differ from XHTML1/HTML4.

missingLinkHrefFilepath = "#{FIXTURES_DIR}/links/missingLinkHref.html"
proofer = run_proofer(missingLinkHrefFilepath, :file)
expect(proofer.failed_tests.first).to match(/anchor has no href attribute/)
expect(proofer.failed_tests).to eq []
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML5 allows links with no href attribute.

head_link = "#{FIXTURES_DIR}/links/head_link_href_absent.html"
proofer = run_proofer(head_link, :file)
expect(proofer.failed_tests.first).to match(/anchor has no href attribute/)
expect(proofer.failed_tests).to eq []
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. HTML5 allows links with no href attribute.

@gjtorikian
Copy link
Owner

@jeremy Would you mind merging master to fix the merge conflict? After that I think this should be good to go. Thank you!

@jeremy
Copy link
Contributor Author

jeremy commented Feb 15, 2017

@gjtorikian Rebased.

@gjtorikian
Copy link
Owner

I completely forgot about this! Sorry!

@gjtorikian gjtorikian dismissed their stale review March 4, 2017 00:08

it's from me

@gjtorikian
Copy link
Owner

Unfortunately I still can't accept this just yet. Nokogiri::HTML5 does not appear to preserve the line numbers correctly. I opened rubys/nokogumbo#53 and am blocked on that.

@fulldecent
Copy link
Collaborator

fulldecent commented Sep 4, 2018

@jeremy The issue is fixed upstream and now Nokogumbo should be providing the line numbers that you need for this fork to be successful.

Would you be able to continue working on this PR? Is there any assistance you require?

@jeremy
Copy link
Contributor Author

jeremy commented Sep 4, 2018

Nokogumbo doesn't include a gemspec, so we can't bundle it from git, and the fix isn't included in the recent 2.0.0.pre.alpha release. That means we can't depend on a fixed version to test this PR, so I branched nokogumbo (rubys/nokogumbo#104) to provide a non-Rakefile-embedded gemspec for temporary use.

Gemfile Outdated Show resolved Hide resolved
html-proofer.gemspec Outdated Show resolved Hide resolved
@jeremy
Copy link
Contributor Author

jeremy commented Sep 4, 2018

Note that a bunch of line numbers are still incorrect (line 0), so the upstream fix is likely insufficient.

Another route is to support what we can and defer to the underlying library on specifics, allowing users to parse HTML5 today despite the incomplete support.

Introduces `--report-missing-doctype`, defaulting to false, to report on
missing `<!DOCTYPE html>` at the beginning of the document.

Depends on Nokogumbo >= 1.4.10 for rubys/nokogumbo#46

Fixes gjtorikian#318.
@jeremy
Copy link
Contributor Author

jeremy commented Sep 4, 2018

Switched nokogumbo dep to track git master pending 2.0.0 release.

@stevecheckoway
Copy link
Contributor

I filed a PR with Nokogiri to add a line setting API (referenced above) and I added rubys/nokogumbo#122 to track the issue in Nokogumbo.

@fulldecent
Copy link
Collaborator

Thank you for providing the update. May rubys/nokogumbo#122 please be referenced as a blocker at top?

@fulldecent
Copy link
Collaborator

We have a path forward with rubys/nokogumbo#130

@fulldecent
Copy link
Collaborator

In Nokogumbo we are waiting for rubys/nokogumbo#130 so that we can use it here.

@fulldecent
Copy link
Collaborator

Fixed merge conflicts. And still have a few failing tests.

@fulldecent
Copy link
Collaborator

Requesting help here to fix unit tests and review this PR

@gjtorikian
Copy link
Owner

For some reason I cannot push to this PR? But here is the diff necessary to fix the tests:

diff --git a/lib/html-proofer/middleware.rb b/lib/html-proofer/middleware.rb
index b1e7eab..a154261 100644
--- a/lib/html-proofer/middleware.rb
+++ b/lib/html-proofer/middleware.rb
@@ -67,7 +67,7 @@ module HTMLProofer
           'response',
           Middleware.options
         ).check_parsed(
-          Nokogiri::HTML(Utils.clean_content(html)), 'response'
+          Nokogiri::HTML5(html), 'response'
         )
 
         if parsed[:failures].length > 0

@fulldecent
Copy link
Collaborator

@gjtorikian I was able to put that in here jeremy@35f85cb

@gjtorikian
Copy link
Owner

@fulldecent Do you have push access to his repo? Normally I can make commits directly to PRs but I can't seem to manage it with this one. Looks like there's just one failure left.

@fulldecent
Copy link
Collaborator

@gjtorikian Yes, but I can only do it with the web interface. Here is the link I'm using:

https://github.com/jeremy/html-proofer/tree/html5

@fulldecent
Copy link
Collaborator

@gjtorikian does this work for you now?

@gjtorikian
Copy link
Owner

I do believe so, but I haven't had a chance to look into it a fix as yet.

@gjtorikian
Copy link
Owner

I believe that should do it. I'll take a look at the merge conflicts as soon as I am able.

@gjtorikian
Copy link
Owner

Thanks everyone. This is looking ready to go. I will merge, release, and write release notes as soon as possible--I only just had a quick ten minutes right now to do all the merge conflict fixes.

@codecov-io
Copy link

codecov-io commented Dec 10, 2019

Codecov Report

Merging #362 into master will decrease coverage by 0.2%.
The diff coverage is 91.83%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #362      +/-   ##
=========================================
- Coverage   98.41%   98.2%   -0.21%     
=========================================
  Files          30      30              
  Lines        1951    1949       -2     
=========================================
- Hits         1920    1914       -6     
- Misses         31      35       +4
Impacted Files Coverage Δ
lib/html-proofer/configuration.rb 88.88% <ø> (ø) ⬆️
lib/html-proofer/middleware.rb 96.55% <ø> (ø) ⬆️
lib/html-proofer.rb 100% <ø> (ø) ⬆️
spec/html-proofer/command_spec.rb 100% <100%> (ø) ⬆️
lib/html-proofer/url_validator.rb 94.07% <100%> (ø) ⬆️
spec/html-proofer/links_spec.rb 99.3% <100%> (ø) ⬆️
lib/html-proofer/check/links.rb 92.59% <100%> (-0.1%) ⬇️
spec/html-proofer/element_spec.rb 100% <100%> (ø) ⬆️
lib/html-proofer/utils.rb 100% <100%> (ø) ⬆️
spec/html-proofer/scripts_spec.rb 100% <100%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c120b7c...a71aac0. Read the comment docs.

@jeremy
Copy link
Contributor Author

jeremy commented Dec 10, 2019

Thanks for taking this to the finish line @gjtorikian!! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Nokogiri with something that validates HTML5
6 participants