Skip to content

Commit

Permalink
Merge pull request #292 from gjtorikian/clean-up-attrs
Browse files Browse the repository at this point in the history
Clean up attr_accessor and handling of checks
  • Loading branch information
gjtorikian committed Jan 4, 2016
2 parents d596b62 + 46c9eb8 commit 9433ffb
Show file tree
Hide file tree
Showing 25 changed files with 449 additions and 403 deletions.
35 changes: 14 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,12 @@ The `HTMLProofer` constructor takes an optional hash of additional options:
| `ext` | The extension of your HTML files including the dot. | `.html`
| `external_only` | Only checks problems with external references. | `false`
| `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
| `http_status_ignore` | An array of numbers representing status codes to ignore. | `[]`
| `log_level` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
| `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
| `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `url_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`. | `{}` |
| `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
| `verbosity` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`

In addition, there are a few "namespaced" options. These are:

Expand Down Expand Up @@ -252,44 +253,36 @@ The cache operates on external links only.

## Logging

HTML-Proofer can be as noisy or as quiet as you'd like. There are two ways to log information:

* If you set the `:verbose` option to `true`, HTML-Proofer will provide some debug information.
* If you set the `:verbosity` option, you can better define the level of logging. See the configuration table above for more information.

`:verbosity` is newer and offers better configuration. `:verbose` will be deprecated in a future 3.x.x release.
HTML-Proofer can be as noisy or as quiet as you'd like. If you set the `:log_level` option, you can better define the level of logging.

## Custom tests

Want to write your own test? Sure! Just create two classes--one that inherits from `HTMLProofer::Checkable`, and another that inherits from `HTMLProofer::CheckRunner`.
Want to write your own test? Sure, that's possible!

The `CheckRunner` subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message)` to explain the error.
Just create a classes that inherits from inherits from `HTMLProofer::Check`. This subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message, line_number: line)` to explain the error.

The `Checkable` subclass defines various helper methods you can use as part of your test. Usually, you'll want to instantiate it within `run`. You have access to all of your element's attributes.
If you're working with the element's attributes (as most checks do), you'll also want to call `create_element(node)` as part of your suite. This contructs an object that contains all the attributes of the HTML element you're iterating on.

Here's an example custom test that protects against `mailto` links that point to `octocat@github.com`:
Here's an example custom test demonstrating these concepts. It reports `mailto` links that point to `octocat@github.com`:

``` ruby
class OctocatLinkCheck < ::HTMLProofer::Checkable
class MailToOctocat < ::HTMLProofer::Check
def mailto?
return false if @data_ignore_proofer || @href.nil? || @href.empty?
return @href.match /^mailto\:/
return false if @link.data_ignore_proofer || blank?(@link.href)
return @link.href.match /^mailto\:/
end

def octocat?
return @href.match /\:octocat@github.com\Z/
return @link.href.match /\:octocat@github.com\Z/
end

end

class MailToOctocat < ::HTMLProofer::CheckRunner
def run
@html.css('a').each do |node|
link = OctocatLinkCheck.new(node, self)
@link = create_element(node)
line = node.line

if link.mailto? && link.octocat?
return add_issue("Don't email the Octocat directly!", line)
if mailto? && octocat?
return add_issue("Don't email the Octocat directly!", line_number: line)
end
end
end
Expand Down
4 changes: 2 additions & 2 deletions bin/htmlproofer
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ Mercenary.program(:htmlproofer) do |p|
p.option 'ext', '--ext EXT', String, 'The extension of your HTML files including the dot. (default: `.html`)'
p.option 'external_only', '--external_only', 'Only checks problems with external references'
p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'A comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
p.option 'http_status_ignore', '--http-status-ignore 123,[xxx, ...]', Array, 'A comma-separated list of numbers representing status codes to ignore.'
p.option 'ignore_script_embeds', '--ignore-script-embeds', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
p.option 'log_level', '--log-level <level>', String, 'Sets the logging level, as determined by Yell'
p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4xx status code range'
p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored'
p.option 'url_swap', '--url-swap re:string,[re:string,...]', Array, 'A comma-separated list containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`.'
p.option 'verbose', '--verbose', 'If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**'
p.option 'verbosity', '--verbosity', String, 'Sets the logging level, as determined by Yell'

p.action do |args, opts|
args = ['.'] if args.empty?
Expand Down
100 changes: 40 additions & 60 deletions lib/html-proofer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ def require_all(path)
end

require_all 'html-proofer'
require_all 'html-proofer/check_runner'
require_all 'html-proofer/checks'
require_all 'html-proofer/check'

require 'parallel'
require 'fileutils'
Expand All @@ -19,52 +18,40 @@ def require_all(path)
class HTMLProofer
include HTMLProofer::Utils

attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts, :external_urls, :iterable_external_urls
attr_reader :options, :external_urls

def initialize(src, opts = {})
FileUtils.mkdir_p(STORAGE_DIR) unless File.exist?(STORAGE_DIR)

@src = src

if opts[:verbose]
warn '`@options[:verbose]` will be removed in a future 3.x.x release: http://git.io/vGHHh'
end

@proofer_opts = HTMLProofer::Configuration::PROOFER_DEFAULTS

@typhoeus_opts = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
opts.delete(:typhoeus)
@options = HTMLProofer::Configuration::PROOFER_DEFAULTS.merge(opts)

@hydra_opts = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})
opts.delete(:hydra)
@options[:typhoeus] = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
@options[:hydra] = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})

# fall back to parallel defaults
@parallel_opts = opts[:parallel] || {}
opts.delete(:parallel)
@options[:parallel] = HTMLProofer::Configuration::PARALLEL_DEFAULTS.merge(opts[:parallel] || {})
@options[:validation] = HTMLProofer::Configuration::VALIDATION_DEFAULTS.merge(opts[:validation] || {})
@options[:cache] = HTMLProofer::Configuration::CACHE_DEFAULTS.merge(opts[:cache] || {})

@validation_opts = opts[:validation] || {}
opts.delete(:validation)

@options = @proofer_opts.merge(opts)
@logger = HTMLProofer::Log.new(@options[:log_level])

@failed_tests = []
end

def logger
@logger ||= HTMLProofer::Log.new(@options[:verbose], @options[:verbosity])
end

def run
logger.log :info, :blue, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"
@logger.log :info, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"

if @src.is_a?(Array) && !@options[:disable_external]
check_list_of_links
else
check_directory_of_files
check_files_in_directory
file_text = pluralize(files.length, 'file', 'files')
@logger.log :info, "Ran on #{file_text}!\n\n"
end

if @failed_tests.empty?
logger.log :info, :green, 'HTML-Proofer finished successfully.'
@logger.log_with_color :info, :green, 'HTML-Proofer finished successfully.'
else
print_failed_tests
end
Expand All @@ -81,13 +68,12 @@ def check_list_of_links
end

# Collects any external URLs found in a directory of files. Also collectes
# every failed test from check_files_for_internal_woes.
# every failed test from process_files.
# Sends the external URLs to Typhoeus for batch processing.
def check_directory_of_files
def check_files_in_directory
@external_urls = {}
results = check_files_for_internal_woes

results.each do |item|
process_files.each do |item|
@external_urls.merge!(item[:external_urls])
@failed_tests.concat(item[:failed_tests])
end
Expand All @@ -101,49 +87,45 @@ def check_directory_of_files
elsif !@options[:disable_external]
validate_urls
end

count = files.length
file_text = pluralize(count, 'file', 'files')
logger.log :info, :blue, "Ran on #{file_text}!\n\n"
end

# Walks over each implemented check and runs them on the files, in parallel.
def check_files_for_internal_woes
Parallel.map(files, @parallel_opts) do |path|
html = create_nokogiri(path)
def process_files
Parallel.map(files, @options[:parallel]) do |path|
result = { :external_urls => {}, :failed_tests => [] }
html = create_nokogiri(path)

checks.each do |klass|
logger.log :debug, :yellow, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts, @validation_opts)
@logger.log :debug, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options)
check.run
result[:external_urls].merge!(check.external_urls)
result[:failed_tests].concat(check.issues) if check.issues.length > 0
result[:failed_tests].concat(check.issues)
end
result
end
end

def validate_urls
url_validator = HTMLProofer::UrlValidator.new(logger, @external_urls, @options, @typhoeus_opts, @hydra_opts)
url_validator = HTMLProofer::UrlValidator.new(@logger, @external_urls, @options)
@failed_tests.concat(url_validator.run)
@iterable_external_urls = url_validator.iterable_external_urls
@external_urls = url_validator.external_urls
end

def files
if File.directory? @src
pattern = File.join(@src, '**', "*#{@options[:ext]}")
files = Dir.glob(pattern).select { |fn| File.file? fn }
files.reject { |f| ignore_file?(f) }
elsif File.extname(@src) == @options[:ext]
[@src].reject { |f| ignore_file?(f) }
else
[]
end
@files ||= if File.directory? @src
pattern = File.join(@src, '**', "*#{@options[:ext]}")
files = Dir.glob(pattern).select { |fn| File.file? fn }
files.reject { |f| ignore_file?(f) }
elsif File.extname(@src) == @options[:ext]
[@src].reject { |f| ignore_file?(f) }
else
[]
end
end

def ignore_file?(file)
options[:file_ignore].each do |pattern|
@options[:file_ignore].each do |pattern|
return true if pattern.is_a?(String) && pattern == file
return true if pattern.is_a?(Regexp) && pattern =~ file
end
Expand All @@ -153,28 +135,26 @@ def ignore_file?(file)

def checks
return @checks unless @checks.nil?
@checks = HTMLProofer::CheckRunner.checks.map(&:name)
@checks = HTMLProofer::Check.subchecks.map(&:name)
@checks.delete('FaviconCheck') unless @options[:check_favicon]
@checks.delete('HtmlCheck') unless @options[:check_html]
@options[:checks_to_ignore].each do |ignored|
@checks.delete(ignored)
end
@options[:checks_to_ignore].each { |ignored| @checks.delete(ignored) }
@checks
end

def failed_tests
return [] if @failed_tests.empty?
result = []
return result if @failed_tests.empty?
@failed_tests.each { |f| result << f.to_s }
result
end

def print_failed_tests
sorted_failures = HTMLProofer::CheckRunner::SortedIssues.new(@failed_tests, @options[:error_sort], logger)
sorted_failures = SortedIssues.new(@failed_tests, @options[:error_sort], @logger)

sorted_failures.sort_and_report
count = @failed_tests.length
failure_text = pluralize(count, 'failure', 'failures')
fail logger.colorize :red, "HTML-Proofer found #{failure_text}!"
fail @logger.colorize :red, "HTML-Proofer found #{failure_text}!"
end
end
28 changes: 22 additions & 6 deletions lib/html-proofer/cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class Cache

FILENAME = File.join(STORAGE_DIR, 'cache.log')

attr_accessor :exists, :load, :cache_log, :cache_time
attr_reader :exists, :load, :cache_log

def initialize(logger, options)
@logger = logger
Expand All @@ -21,7 +21,7 @@ def initialize(logger, options)
@load = false
else
@load = true
@parsed_timeframe = parsed_timeframe(options[:timeframe] || '30d')
@parsed_timeframe = parsed_timeframe(options[:timeframe])
end
@cache_time = Time.now

Expand All @@ -42,6 +42,10 @@ def urls
@cache_log['urls'] || []
end

def size
@cache_log.length
end

def parsed_timeframe(timeframe)
time, date = timeframe.match(/(\d+)(\D)/).captures
time = time.to_f
Expand Down Expand Up @@ -80,21 +84,21 @@ def detect_url_changes(found)
if existing_urls.include?(url)
true
else
@logger.log :debug, :yellow, "Adding #{url} to cache check"
@logger.log :debug, "Adding #{url} to cache check"
false
end
end

new_link_count = additions.length
new_link_text = pluralize(new_link_count, 'link', 'links')
@logger.log :info, :blue, "Adding #{new_link_text} to the cache..."
@logger.log :info, "Adding #{new_link_text} to the cache..."

# remove from cache URLs that no longer exist
del = 0
@cache_log.delete_if do |url, _|
url = clean_url(url)
if !found_urls.include?(url)
@logger.log :debug, :yellow, "Removing #{url} from cache check"
@logger.log :debug, "Removing #{url} from cache check"
del += 1
true
else
Expand All @@ -103,7 +107,7 @@ def detect_url_changes(found)
end

del_link_text = pluralize(del, 'link', 'links')
@logger.log :info, :blue, "Removing #{del_link_text} from the cache..."
@logger.log :info, "Removing #{del_link_text} from the cache..."

additions
end
Expand All @@ -116,6 +120,18 @@ def load?
@load.nil?
end

def retrieve_urls(external_urls)
urls_to_check = detect_url_changes(external_urls)
@cache_log.each_pair do |url, cache|
if within_timeframe?(cache['time'])
next if cache['message'].empty? # these were successes to skip
urls_to_check[url] = cache['filenames'] # these are failures to retry
else
urls_to_check[url] = cache['filenames'] # pass or fail, recheck expired links
end
end
urls_to_check
end

# FIXME: there seems to be some discrepenacy where Typhoeus occasionally adds
# a trailing slash to URL strings, which causes issues with the cache
Expand Down
Loading

0 comments on commit 9433ffb

Please sign in to comment.