Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore embedded scripts when asked #234

Merged
merged 7 commits into from
Sep 1, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
| `alt_ignore` | An array of Strings or RegExps containing `img`s whose missing `alt` tags are safe to ignore. | `[]` |
| `empty_alt_ignore` | If `true`, ignores images with empty alt tags. | `false` |
| `check_external_hash` | Checks whether external hashes exist (even if the website exists). This slows the checker down. | `false` |
| `check_favicon` | Enables the favicon checker. | `false` |
| `check_html` | Enables HTML validation errors from Nokogiri | `false` |
|`checks_to_ignore`| An array of Strings indicating which checks you'd like to not perform. | `[]`
| `directory_index_file` | Sets the file to look for when a link refers to a directory. | `index.html` |
| `disable_external` | If `true`, does not run the external link checker, which can take a lot of time. | `false` |
Expand All @@ -148,10 +150,9 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
| `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
| `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. | `{}` |
| `ignore_script_embeds` | When `check_html` is enabled, `script` tags containing markup [are reported as errors](http://git.io/vOovv). Enabling this option ignores those errors. | `false`
| `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
| `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `check_favicon` | Enables the favicon checker. | `false` |
| `check_html` | Enables HTML validation errors from Nokogiri | `false` |
| `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. | `false` |

### Configuring Typhoeus and Hydra
Expand Down
5 changes: 3 additions & 2 deletions bin/htmlproof
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,17 @@ Mercenary.program(:htmlproof) do |p|
p.option 'empty_alt_ignore', '--empty-alt-ignore', 'Ignores images with empty alt tags.'
p.option 'checks_to_ignore', '--checks-to-ignore check1,[check2,...]', Array, ' An array of Strings indicating which checks you\'d like to not perform.'
p.option 'check_external_hash', '--check-external-hash', 'Checks whether external hashes exist (even if the website exists). This slows the checker down (default: `false`).'
p.option 'check_favicon', '--check-favicon', 'Enables the favicon checker (default: `false`).'
p.option 'check_html', '--check-html', 'Enables HTML validation errors from Nokogiri (default: `false`).'
p.option 'directory_index_file', '--directory-index-file', String, 'Sets the file to look for when a link refers to a directory. (default: `index.html`)'
p.option 'disable_external', '--disable-external', 'Disables the external link checker (default: `false`)'
p.option 'error_sort', '--error-sort SORT', 'Defines the sort order for error output. Can be `path`, `desc`, or `status` (default: `path`).'
p.option 'ext', '--ext EXT', String, 'The extension of your HTML files (default: `.html`)'
p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'Comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
p.option 'href_ignore', '--href-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing `href`s that are safe to ignore.'
p.option 'href_swap', '--href-swap re:string,[re:string,...]', Array, 'Comma-separated list of key-value pairs of `RegExp:String`. Transforms links matching `RegExp` into `String`'
p.option 'ignore_script_errors', '--ignore-script-errors', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4x status code range.'
p.option 'check_favicon', '--check-favicon', 'Enables the favicon checker (default: `false`).'
p.option 'check_html', '--check-html', 'Enables HTML validation errors from Nokogiri (default: `false`).'
p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing URLs that are safe to ignore.'
p.option 'verbose', '--verbose', 'Enables more verbose logging.'

Expand Down
7 changes: 5 additions & 2 deletions lib/html/proofer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ module HTML
class Proofer
include Utils

attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts
attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts

TYPHOEUS_DEFAULTS = {
:followlocation => true,
Expand Down Expand Up @@ -62,6 +62,9 @@ def initialize(src, opts = {})
@parallel_opts = opts[:parallel] || {}
opts.delete(:parallel)

@validation_opts = opts[:validation] || {}
opts.delete(:validation)

@options = @proofer_opts.merge(opts)

@failed_tests = []
Expand Down Expand Up @@ -124,7 +127,7 @@ def check_files_for_internal_woes

checks.each do |klass|
logger.log :debug, :yellow, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts)
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts, @validation_opts)
check.run
result[:external_urls].merge!(check.external_urls)
result[:failed_tests].concat(check.issues) if check.issues.length > 0
Expand Down
5 changes: 3 additions & 2 deletions lib/html/proofer/check_runner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,17 @@ class Proofer
class CheckRunner

attr_reader :issues, :src, :path, :options, :typhoeus_opts, :hydra_opts, :parallel_opts, \
:external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore
:validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore

def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts)
def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts, validation_opts)
@src = src
@path = path
@html = remove_ignored(html)
@options = options
@typhoeus_opts = typhoeus_opts
@hydra_opts = hydra_opts
@parallel_opts = parallel_opts
@validation_opts = validation_opts
@issues = []
@href_ignores = @options[:href_ignore]
@url_ignores = @options[:url_ignore]
Expand Down
10 changes: 7 additions & 3 deletions lib/html/proofer/checks/html.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# encoding: utf-8

class HtmlCheck < ::HTML::Proofer::CheckRunner

# new html5 tags (source: http://www.w3schools.com/html/html5_new_elements.asp)
# and svg child tags (source: https://developer.mozilla.org/en-US/docs/Web/SVG/Element)
HTML5_TAGS = %w(article aside bdi details dialog figcaption
Expand Down Expand Up @@ -30,11 +29,16 @@ class HtmlCheck < ::HTML::Proofer::CheckRunner

def run
@html.errors.each do |e|
message = e.message
line = e.line
# Nokogiri (or rather libxml2 underhood) only recognizes html4 tags,
# so we need to skip errors caused by the new tags in html5
next if HTML5_TAGS.include? e.to_s[/Tag ([\w-]+) invalid/o, 1]
next if HTML5_TAGS.include? message[/Tag ([\w-]+) invalid/o, 1]

# tags embedded in scripts are used in templating languages: http://git.io/vOovv
next if @validation_opts[:ignore_script_embeds] && message =~ /Element script embeds close tag/

add_issue(e.to_s, e.line)
add_issue(message, line)
end
end
end
3 changes: 3 additions & 0 deletions spec/html/proofer/fixtures/html/ignore_script_embeds.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<script type="text/html" id="navbar-logged-in">
<li><a href="https://www.github.com/features">Home</a></li>
</script>
7 changes: 7 additions & 0 deletions spec/html/proofer/html_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,11 @@
proofer = run_proofer(html, { :check_html => true })
expect(proofer.failed_tests.to_s).to match(/Couldn't find end of Start Tag a \(line 6\)/)
end

it 'ignores embeded scripts when asked' do
opts = { :check_html => true, :validation => { :ignore_script_embeds => true } }
ignorableScript = "#{FIXTURES_DIR}/html/ignore_script_embeds.html"
proofer = run_proofer(ignorableScript, opts)
expect(proofer.failed_tests).to eq []
end
end
1 change: 0 additions & 1 deletion spec/html/proofer/scripts_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,4 @@
proofer = run_proofer(ignorableLinks, { :url_ignore => [/\/assets\/.*(js|css|png|svg)/] })
expect(proofer.failed_tests).to eq []
end

end