Skip to content

Commit

Permalink
Merge pull request #234 from gjtorikian/ignore-scripts
Browse files Browse the repository at this point in the history
Ignore embedded scripts when asked
  • Loading branch information
gjtorikian committed Sep 1, 2015
2 parents 23ed786 + 8ae9cd7 commit b206f8d
Show file tree
Hide file tree
Showing 8 changed files with 31 additions and 12 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
| `alt_ignore` | An array of Strings or RegExps containing `img`s whose missing `alt` tags are safe to ignore. | `[]` |
| `empty_alt_ignore` | If `true`, ignores images with empty alt tags. | `false` |
| `check_external_hash` | Checks whether external hashes exist (even if the website exists). This slows the checker down. | `false` |
| `check_favicon` | Enables the favicon checker. | `false` |
| `check_html` | Enables HTML validation errors from Nokogiri | `false` |
|`checks_to_ignore`| An array of Strings indicating which checks you'd like to not perform. | `[]`
| `directory_index_file` | Sets the file to look for when a link refers to a directory. | `index.html` |
| `disable_external` | If `true`, does not run the external link checker, which can take a lot of time. | `false` |
Expand All @@ -148,10 +150,9 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
| `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
| `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. | `{}` |
| `ignore_script_embeds` | When `check_html` is enabled, `script` tags containing markup [are reported as errors](http://git.io/vOovv). Enabling this option ignores those errors. | `false`
| `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
| `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `check_favicon` | Enables the favicon checker. | `false` |
| `check_html` | Enables HTML validation errors from Nokogiri | `false` |
| `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. | `false` |

### Configuring Typhoeus and Hydra
Expand Down
5 changes: 3 additions & 2 deletions bin/htmlproof
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,17 @@ Mercenary.program(:htmlproof) do |p|
p.option 'empty_alt_ignore', '--empty-alt-ignore', 'Ignores images with empty alt tags.'
p.option 'checks_to_ignore', '--checks-to-ignore check1,[check2,...]', Array, ' An array of Strings indicating which checks you\'d like to not perform.'
p.option 'check_external_hash', '--check-external-hash', 'Checks whether external hashes exist (even if the website exists). This slows the checker down (default: `false`).'
p.option 'check_favicon', '--check-favicon', 'Enables the favicon checker (default: `false`).'
p.option 'check_html', '--check-html', 'Enables HTML validation errors from Nokogiri (default: `false`).'
p.option 'directory_index_file', '--directory-index-file', String, 'Sets the file to look for when a link refers to a directory. (default: `index.html`)'
p.option 'disable_external', '--disable-external', 'Disables the external link checker (default: `false`)'
p.option 'error_sort', '--error-sort SORT', 'Defines the sort order for error output. Can be `path`, `desc`, or `status` (default: `path`).'
p.option 'ext', '--ext EXT', String, 'The extension of your HTML files (default: `.html`)'
p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'Comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
p.option 'href_ignore', '--href-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing `href`s that are safe to ignore.'
p.option 'href_swap', '--href-swap re:string,[re:string,...]', Array, 'Comma-separated list of key-value pairs of `RegExp:String`. Transforms links matching `RegExp` into `String`'
p.option 'ignore_script_errors', '--ignore-script-errors', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4x status code range.'
p.option 'check_favicon', '--check-favicon', 'Enables the favicon checker (default: `false`).'
p.option 'check_html', '--check-html', 'Enables HTML validation errors from Nokogiri (default: `false`).'
p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing URLs that are safe to ignore.'
p.option 'verbose', '--verbose', 'Enables more verbose logging.'

Expand Down
7 changes: 5 additions & 2 deletions lib/html/proofer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ module HTML
class Proofer
include Utils

attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts
attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts

TYPHOEUS_DEFAULTS = {
:followlocation => true,
Expand Down Expand Up @@ -62,6 +62,9 @@ def initialize(src, opts = {})
@parallel_opts = opts[:parallel] || {}
opts.delete(:parallel)

@validation_opts = opts[:validation] || {}
opts.delete(:validation)

@options = @proofer_opts.merge(opts)

@failed_tests = []
Expand Down Expand Up @@ -124,7 +127,7 @@ def check_files_for_internal_woes

checks.each do |klass|
logger.log :debug, :yellow, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts)
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts, @validation_opts)
check.run
result[:external_urls].merge!(check.external_urls)
result[:failed_tests].concat(check.issues) if check.issues.length > 0
Expand Down
5 changes: 3 additions & 2 deletions lib/html/proofer/check_runner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,17 @@ class Proofer
class CheckRunner

attr_reader :issues, :src, :path, :options, :typhoeus_opts, :hydra_opts, :parallel_opts, \
:external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore
:validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore

def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts)
def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts, validation_opts)
@src = src
@path = path
@html = remove_ignored(html)
@options = options
@typhoeus_opts = typhoeus_opts
@hydra_opts = hydra_opts
@parallel_opts = parallel_opts
@validation_opts = validation_opts
@issues = []
@href_ignores = @options[:href_ignore]
@url_ignores = @options[:url_ignore]
Expand Down
10 changes: 7 additions & 3 deletions lib/html/proofer/checks/html.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# encoding: utf-8

class HtmlCheck < ::HTML::Proofer::CheckRunner

# new html5 tags (source: http://www.w3schools.com/html/html5_new_elements.asp)
# and svg child tags (source: https://developer.mozilla.org/en-US/docs/Web/SVG/Element)
HTML5_TAGS = %w(article aside bdi details dialog figcaption
Expand Down Expand Up @@ -30,11 +29,16 @@ class HtmlCheck < ::HTML::Proofer::CheckRunner

def run
@html.errors.each do |e|
message = e.message
line = e.line
# Nokogiri (or rather libxml2 underhood) only recognizes html4 tags,
# so we need to skip errors caused by the new tags in html5
next if HTML5_TAGS.include? e.to_s[/Tag ([\w-]+) invalid/o, 1]
next if HTML5_TAGS.include? message[/Tag ([\w-]+) invalid/o, 1]

# tags embedded in scripts are used in templating languages: http://git.io/vOovv
next if @validation_opts[:ignore_script_embeds] && message =~ /Element script embeds close tag/

add_issue(e.to_s, e.line)
add_issue(message, line)
end
end
end
3 changes: 3 additions & 0 deletions spec/html/proofer/fixtures/html/ignore_script_embeds.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<script type="text/html" id="navbar-logged-in">
<li><a href="https://www.github.com/features">Home</a></li>
</script>
7 changes: 7 additions & 0 deletions spec/html/proofer/html_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,11 @@
proofer = run_proofer(html, { :check_html => true })
expect(proofer.failed_tests.to_s).to match(/Couldn't find end of Start Tag a \(line 6\)/)
end

it 'ignores embeded scripts when asked' do
opts = { :check_html => true, :validation => { :ignore_script_embeds => true } }
ignorableScript = "#{FIXTURES_DIR}/html/ignore_script_embeds.html"
proofer = run_proofer(ignorableScript, opts)
expect(proofer.failed_tests).to eq []
end
end
1 change: 0 additions & 1 deletion spec/html/proofer/scripts_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,4 @@
proofer = run_proofer(ignorableLinks, { :url_ignore => [/\/assets\/.*(js|css|png|svg)/] })
expect(proofer.failed_tests).to eq []
end

end

0 comments on commit b206f8d

Please sign in to comment.