Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up attr_accessor and handling of checks #292

Merged
merged 21 commits into from
Jan 4, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 14 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,12 @@ The `HTMLProofer` constructor takes an optional hash of additional options:
| `ext` | The extension of your HTML files including the dot. | `.html`
| `external_only` | Only checks problems with external references. | `false`
| `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
| `http_status_ignore` | An array of numbers representing status codes to ignore. | `[]`
| `log_level` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
| `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
| `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
| `url_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`. | `{}` |
| `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
| `verbosity` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`

In addition, there are a few "namespaced" options. These are:

Expand Down Expand Up @@ -252,44 +253,36 @@ The cache operates on external links only.

## Logging

HTML-Proofer can be as noisy or as quiet as you'd like. There are two ways to log information:

* If you set the `:verbose` option to `true`, HTML-Proofer will provide some debug information.
* If you set the `:verbosity` option, you can better define the level of logging. See the configuration table above for more information.

`:verbosity` is newer and offers better configuration. `:verbose` will be deprecated in a future 3.x.x release.
HTML-Proofer can be as noisy or as quiet as you'd like. If you set the `:log_level` option, you can better define the level of logging.

## Custom tests

Want to write your own test? Sure! Just create two classes--one that inherits from `HTMLProofer::Checkable`, and another that inherits from `HTMLProofer::CheckRunner`.
Want to write your own test? Sure, that's possible!

The `CheckRunner` subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message)` to explain the error.
Just create a classes that inherits from inherits from `HTMLProofer::Check`. This subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message, line_number: line)` to explain the error.

The `Checkable` subclass defines various helper methods you can use as part of your test. Usually, you'll want to instantiate it within `run`. You have access to all of your element's attributes.
If you're working with the element's attributes (as most checks do), you'll also want to call `create_element(node)` as part of your suite. This contructs an object that contains all the attributes of the HTML element you're iterating on.

Here's an example custom test that protects against `mailto` links that point to `octocat@github.com`:
Here's an example custom test demonstrating these concepts. It reports `mailto` links that point to `octocat@github.com`:

``` ruby
class OctocatLinkCheck < ::HTMLProofer::Checkable
class MailToOctocat < ::HTMLProofer::Check
def mailto?
return false if @data_ignore_proofer || @href.nil? || @href.empty?
return @href.match /^mailto\:/
return false if @link.data_ignore_proofer || blank?(@link.href)
return @link.href.match /^mailto\:/
end

def octocat?
return @href.match /\:octocat@github.com\Z/
return @link.href.match /\:octocat@github.com\Z/
end

end

class MailToOctocat < ::HTMLProofer::CheckRunner
def run
@html.css('a').each do |node|
link = OctocatLinkCheck.new(node, self)
@link = create_element(node)
line = node.line

if link.mailto? && link.octocat?
return add_issue("Don't email the Octocat directly!", line)
if mailto? && octocat?
return add_issue("Don't email the Octocat directly!", line_number: line)
end
end
end
Expand Down
4 changes: 2 additions & 2 deletions bin/htmlproofer
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ Mercenary.program(:htmlproofer) do |p|
p.option 'ext', '--ext EXT', String, 'The extension of your HTML files including the dot. (default: `.html`)'
p.option 'external_only', '--external_only', 'Only checks problems with external references'
p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'A comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
p.option 'http_status_ignore', '--http-status-ignore 123,[xxx, ...]', Array, 'A comma-separated list of numbers representing status codes to ignore.'
p.option 'ignore_script_embeds', '--ignore-script-embeds', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
p.option 'log_level', '--log-level <level>', String, 'Sets the logging level, as determined by Yell'
p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4xx status code range'
p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored'
p.option 'url_swap', '--url-swap re:string,[re:string,...]', Array, 'A comma-separated list containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`.'
p.option 'verbose', '--verbose', 'If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**'
p.option 'verbosity', '--verbosity', String, 'Sets the logging level, as determined by Yell'

p.action do |args, opts|
args = ['.'] if args.empty?
Expand Down
100 changes: 40 additions & 60 deletions lib/html-proofer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ def require_all(path)
end

require_all 'html-proofer'
require_all 'html-proofer/check_runner'
require_all 'html-proofer/checks'
require_all 'html-proofer/check'

require 'parallel'
require 'fileutils'
Expand All @@ -19,52 +18,40 @@ def require_all(path)
class HTMLProofer
include HTMLProofer::Utils

attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts, :external_urls, :iterable_external_urls
attr_reader :options, :external_urls

def initialize(src, opts = {})
FileUtils.mkdir_p(STORAGE_DIR) unless File.exist?(STORAGE_DIR)

@src = src

if opts[:verbose]
warn '`@options[:verbose]` will be removed in a future 3.x.x release: http://git.io/vGHHh'
end

@proofer_opts = HTMLProofer::Configuration::PROOFER_DEFAULTS

@typhoeus_opts = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
opts.delete(:typhoeus)
@options = HTMLProofer::Configuration::PROOFER_DEFAULTS.merge(opts)

@hydra_opts = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})
opts.delete(:hydra)
@options[:typhoeus] = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
@options[:hydra] = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})

# fall back to parallel defaults
@parallel_opts = opts[:parallel] || {}
opts.delete(:parallel)
@options[:parallel] = HTMLProofer::Configuration::PARALLEL_DEFAULTS.merge(opts[:parallel] || {})
@options[:validation] = HTMLProofer::Configuration::VALIDATION_DEFAULTS.merge(opts[:validation] || {})
@options[:cache] = HTMLProofer::Configuration::CACHE_DEFAULTS.merge(opts[:cache] || {})

@validation_opts = opts[:validation] || {}
opts.delete(:validation)

@options = @proofer_opts.merge(opts)
@logger = HTMLProofer::Log.new(@options[:log_level])

@failed_tests = []
end

def logger
@logger ||= HTMLProofer::Log.new(@options[:verbose], @options[:verbosity])
end

def run
logger.log :info, :blue, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"
@logger.log :info, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"

if @src.is_a?(Array) && !@options[:disable_external]
check_list_of_links
else
check_directory_of_files
check_files_in_directory
file_text = pluralize(files.length, 'file', 'files')
@logger.log :info, "Ran on #{file_text}!\n\n"
end

if @failed_tests.empty?
logger.log :info, :green, 'HTML-Proofer finished successfully.'
@logger.log_with_color :info, :green, 'HTML-Proofer finished successfully.'
else
print_failed_tests
end
Expand All @@ -81,13 +68,12 @@ def check_list_of_links
end

# Collects any external URLs found in a directory of files. Also collectes
# every failed test from check_files_for_internal_woes.
# every failed test from process_files.
# Sends the external URLs to Typhoeus for batch processing.
def check_directory_of_files
def check_files_in_directory
@external_urls = {}
results = check_files_for_internal_woes

results.each do |item|
process_files.each do |item|
@external_urls.merge!(item[:external_urls])
@failed_tests.concat(item[:failed_tests])
end
Expand All @@ -101,49 +87,45 @@ def check_directory_of_files
elsif !@options[:disable_external]
validate_urls
end

count = files.length
file_text = pluralize(count, 'file', 'files')
logger.log :info, :blue, "Ran on #{file_text}!\n\n"
end

# Walks over each implemented check and runs them on the files, in parallel.
def check_files_for_internal_woes
Parallel.map(files, @parallel_opts) do |path|
html = create_nokogiri(path)
def process_files
Parallel.map(files, @options[:parallel]) do |path|
result = { :external_urls => {}, :failed_tests => [] }
html = create_nokogiri(path)

checks.each do |klass|
logger.log :debug, :yellow, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts, @validation_opts)
@logger.log :debug, "Checking #{klass.to_s.downcase} on #{path} ..."
check = Object.const_get(klass).new(@src, path, html, @options)
check.run
result[:external_urls].merge!(check.external_urls)
result[:failed_tests].concat(check.issues) if check.issues.length > 0
result[:failed_tests].concat(check.issues)
end
result
end
end

def validate_urls
url_validator = HTMLProofer::UrlValidator.new(logger, @external_urls, @options, @typhoeus_opts, @hydra_opts)
url_validator = HTMLProofer::UrlValidator.new(@logger, @external_urls, @options)
@failed_tests.concat(url_validator.run)
@iterable_external_urls = url_validator.iterable_external_urls
@external_urls = url_validator.external_urls
end

def files
if File.directory? @src
pattern = File.join(@src, '**', "*#{@options[:ext]}")
files = Dir.glob(pattern).select { |fn| File.file? fn }
files.reject { |f| ignore_file?(f) }
elsif File.extname(@src) == @options[:ext]
[@src].reject { |f| ignore_file?(f) }
else
[]
end
@files ||= if File.directory? @src
pattern = File.join(@src, '**', "*#{@options[:ext]}")
files = Dir.glob(pattern).select { |fn| File.file? fn }
files.reject { |f| ignore_file?(f) }
elsif File.extname(@src) == @options[:ext]
[@src].reject { |f| ignore_file?(f) }
else
[]
end
end

def ignore_file?(file)
options[:file_ignore].each do |pattern|
@options[:file_ignore].each do |pattern|
return true if pattern.is_a?(String) && pattern == file
return true if pattern.is_a?(Regexp) && pattern =~ file
end
Expand All @@ -153,28 +135,26 @@ def ignore_file?(file)

def checks
return @checks unless @checks.nil?
@checks = HTMLProofer::CheckRunner.checks.map(&:name)
@checks = HTMLProofer::Check.subchecks.map(&:name)
@checks.delete('FaviconCheck') unless @options[:check_favicon]
@checks.delete('HtmlCheck') unless @options[:check_html]
@options[:checks_to_ignore].each do |ignored|
@checks.delete(ignored)
end
@options[:checks_to_ignore].each { |ignored| @checks.delete(ignored) }
@checks
end

def failed_tests
return [] if @failed_tests.empty?
result = []
return result if @failed_tests.empty?
@failed_tests.each { |f| result << f.to_s }
result
end

def print_failed_tests
sorted_failures = HTMLProofer::CheckRunner::SortedIssues.new(@failed_tests, @options[:error_sort], logger)
sorted_failures = SortedIssues.new(@failed_tests, @options[:error_sort], @logger)

sorted_failures.sort_and_report
count = @failed_tests.length
failure_text = pluralize(count, 'failure', 'failures')
fail logger.colorize :red, "HTML-Proofer found #{failure_text}!"
fail @logger.colorize :red, "HTML-Proofer found #{failure_text}!"
end
end
28 changes: 22 additions & 6 deletions lib/html-proofer/cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ class Cache

FILENAME = File.join(STORAGE_DIR, 'cache.log')

attr_accessor :exists, :load, :cache_log, :cache_time
attr_reader :exists, :load, :cache_log

def initialize(logger, options)
@logger = logger
Expand All @@ -21,7 +21,7 @@ def initialize(logger, options)
@load = false
else
@load = true
@parsed_timeframe = parsed_timeframe(options[:timeframe] || '30d')
@parsed_timeframe = parsed_timeframe(options[:timeframe])
end
@cache_time = Time.now

Expand All @@ -42,6 +42,10 @@ def urls
@cache_log['urls'] || []
end

def size
@cache_log.length
end

def parsed_timeframe(timeframe)
time, date = timeframe.match(/(\d+)(\D)/).captures
time = time.to_f
Expand Down Expand Up @@ -80,21 +84,21 @@ def detect_url_changes(found)
if existing_urls.include?(url)
true
else
@logger.log :debug, :yellow, "Adding #{url} to cache check"
@logger.log :debug, "Adding #{url} to cache check"
false
end
end

new_link_count = additions.length
new_link_text = pluralize(new_link_count, 'link', 'links')
@logger.log :info, :blue, "Adding #{new_link_text} to the cache..."
@logger.log :info, "Adding #{new_link_text} to the cache..."

# remove from cache URLs that no longer exist
del = 0
@cache_log.delete_if do |url, _|
url = clean_url(url)
if !found_urls.include?(url)
@logger.log :debug, :yellow, "Removing #{url} from cache check"
@logger.log :debug, "Removing #{url} from cache check"
del += 1
true
else
Expand All @@ -103,7 +107,7 @@ def detect_url_changes(found)
end

del_link_text = pluralize(del, 'link', 'links')
@logger.log :info, :blue, "Removing #{del_link_text} from the cache..."
@logger.log :info, "Removing #{del_link_text} from the cache..."

additions
end
Expand All @@ -116,6 +120,18 @@ def load?
@load.nil?
end

def retrieve_urls(external_urls)
urls_to_check = detect_url_changes(external_urls)
@cache_log.each_pair do |url, cache|
if within_timeframe?(cache['time'])
next if cache['message'].empty? # these were successes to skip
urls_to_check[url] = cache['filenames'] # these are failures to retry
else
urls_to_check[url] = cache['filenames'] # pass or fail, recheck expired links
end
end
urls_to_check
end

# FIXME: there seems to be some discrepenacy where Typhoeus occasionally adds
# a trailing slash to URL strings, which causes issues with the cache
Expand Down
Loading