Merge pull request #292 from gjtorikian/clean-up-attrs

Clean up attr_accessor and handling of checks
gjtorikian · Jan 4, 2016 · 9433ffb · 9433ffb
2 parents d596b62 + 46c9eb8
commit 9433ffb
Show file tree

Hide file tree

Showing 25 changed files with 449 additions and 403 deletions.
diff --git a/README.md b/README.md
@@ -168,11 +168,12 @@ The `HTMLProofer` constructor takes an optional hash of additional options:
 | `ext` | The extension of your HTML files including the dot. | `.html`
 | `external_only` | Only checks problems with external references. | `false`
 | `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
+| `http_status_ignore` | An array of numbers representing status codes to ignore. | `[]`
+| `log_level` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
 | `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
 | `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
 | `url_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`. | `{}` |
 | `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
-| `verbosity` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
 
 In addition, there are a few "namespaced" options. These are:
 
@@ -252,44 +253,36 @@ The cache operates on external links only.
 
 ## Logging
 
-HTML-Proofer can be as noisy or as quiet as you'd like. There are two ways to log information:
-
-* If you set the `:verbose` option to `true`, HTML-Proofer will provide some debug information.
-* If you set the `:verbosity` option, you can better define the level of logging. See the configuration table above for more information.
-
-`:verbosity` is newer and offers better configuration. `:verbose` will be deprecated in a future 3.x.x release.
+HTML-Proofer can be as noisy or as quiet as you'd like. If you set the `:log_level` option, you can better define the level of logging.
 
 ## Custom tests
 
-Want to write your own test? Sure! Just create two classes--one that inherits from `HTMLProofer::Checkable`, and another that inherits from `HTMLProofer::CheckRunner`.
+Want to write your own test? Sure, that's possible!
 
-The `CheckRunner` subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message)` to explain the error.
+Just create a classes that inherits from inherits from `HTMLProofer::Check`. This subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message, line_number: line)` to explain the error.
 
-The `Checkable` subclass defines various helper methods you can use as part of your test. Usually, you'll want to instantiate it within `run`. You have access to all of your element's attributes.
+If you're working with the element's attributes (as most checks do), you'll also want to call `create_element(node)` as part of your suite. This contructs an object that contains all the attributes of the HTML element you're iterating on.
 
-Here's an example custom test that protects against `mailto` links that point to `octocat@github.com`:
+Here's an example custom test demonstrating these concepts. It reports `mailto` links that point to `octocat@github.com`:
 
 ``` ruby
-class OctocatLinkCheck < ::HTMLProofer::Checkable
+class MailToOctocat < ::HTMLProofer::Check
   def mailto?
-    return false if @data_ignore_proofer || @href.nil? || @href.empty?
-    return @href.match /^mailto\:/
+    return false if @link.data_ignore_proofer || blank?(@link.href)
+    return @link.href.match /^mailto\:/
   end
 
   def octocat?
-    return @href.match /\:octocat@github.com\Z/
+    return @link.href.match /\:octocat@github.com\Z/
   end
 
-end
-
-class MailToOctocat < ::HTMLProofer::CheckRunner
   def run
     @html.css('a').each do |node|
-      link = OctocatLinkCheck.new(node, self)
+      @link = create_element(node)
       line = node.line
 
-      if link.mailto? && link.octocat?
-        return add_issue("Don't email the Octocat directly!", line)
+      if mailto? && octocat?
+        return add_issue("Don't email the Octocat directly!", line_number: line)
       end
     end
   end

diff --git a/bin/htmlproofer b/bin/htmlproofer
@@ -28,12 +28,12 @@ Mercenary.program(:htmlproofer) do |p|
   p.option 'ext', '--ext EXT', String, 'The extension of your HTML files including the dot. (default: `.html`)'
   p.option 'external_only', '--external_only', 'Only checks problems with external references'
   p.option 'file_ignore', '--file-ignore file1,[file2,...]', Array, 'A comma-separated list of Strings or RegExps containing file paths that are safe to ignore'
+  p.option 'http_status_ignore', '--http-status-ignore 123,[xxx, ...]', Array, 'A comma-separated list of numbers representing status codes to ignore.'
   p.option 'ignore_script_embeds', '--ignore-script-embeds', 'Ignore `check_html` errors associated with `script`s (default: `false`)'
+  p.option 'log_level', '--log-level <level>', String, 'Sets the logging level, as determined by Yell'
   p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4xx status code range'
   p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored'
   p.option 'url_swap', '--url-swap re:string,[re:string,...]', Array, 'A comma-separated list containing key-value pairs of `RegExp => String`. It transforms URLs that match `RegExp` into `String` via `gsub`.'
-  p.option 'verbose', '--verbose', 'If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**'
-  p.option 'verbosity', '--verbosity', String, 'Sets the logging level, as determined by Yell'
 
   p.action do |args, opts|
     args = ['.'] if args.empty?

diff --git a/lib/html-proofer.rb b/lib/html-proofer.rb
@@ -6,8 +6,7 @@ def require_all(path)
 end
 
 require_all 'html-proofer'
-require_all 'html-proofer/check_runner'
-require_all 'html-proofer/checks'
+require_all 'html-proofer/check'
 
 require 'parallel'
 require 'fileutils'
@@ -19,52 +18,40 @@ def require_all(path)
 class HTMLProofer
   include HTMLProofer::Utils
 
-  attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts, :external_urls, :iterable_external_urls
+  attr_reader :options, :external_urls
 
   def initialize(src, opts = {})
     FileUtils.mkdir_p(STORAGE_DIR) unless File.exist?(STORAGE_DIR)
 
     @src = src
 
-    if opts[:verbose]
-      warn '`@options[:verbose]` will be removed in a future 3.x.x release: http://git.io/vGHHh'
-    end
-
-    @proofer_opts = HTMLProofer::Configuration::PROOFER_DEFAULTS
-
-    @typhoeus_opts = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
-    opts.delete(:typhoeus)
+    @options = HTMLProofer::Configuration::PROOFER_DEFAULTS.merge(opts)
 
-    @hydra_opts = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})
-    opts.delete(:hydra)
+    @options[:typhoeus] = HTMLProofer::Configuration::TYPHOEUS_DEFAULTS.merge(opts[:typhoeus] || {})
+    @options[:hydra] = HTMLProofer::Configuration::HYDRA_DEFAULTS.merge(opts[:hydra] || {})
 
-    # fall back to parallel defaults
-    @parallel_opts = opts[:parallel] || {}
-    opts.delete(:parallel)
+    @options[:parallel] = HTMLProofer::Configuration::PARALLEL_DEFAULTS.merge(opts[:parallel] || {})
+    @options[:validation] = HTMLProofer::Configuration::VALIDATION_DEFAULTS.merge(opts[:validation] || {})
+    @options[:cache] = HTMLProofer::Configuration::CACHE_DEFAULTS.merge(opts[:cache] || {})
 
-    @validation_opts = opts[:validation] || {}
-    opts.delete(:validation)
-
-    @options = @proofer_opts.merge(opts)
+    @logger = HTMLProofer::Log.new(@options[:log_level])
 
     @failed_tests = []
   end
 
-  def logger
-    @logger ||= HTMLProofer::Log.new(@options[:verbose], @options[:verbosity])
-  end
-
   def run
-    logger.log :info, :blue, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"
+    @logger.log :info, "Running #{checks} on #{@src} on *#{@options[:ext]}... \n\n"
 
     if @src.is_a?(Array) && !@options[:disable_external]
       check_list_of_links
     else
-      check_directory_of_files
+      check_files_in_directory
+      file_text = pluralize(files.length, 'file', 'files')
+      @logger.log :info, "Ran on #{file_text}!\n\n"
     end
 
     if @failed_tests.empty?
-      logger.log :info, :green, 'HTML-Proofer finished successfully.'
+      @logger.log_with_color :info, :green, 'HTML-Proofer finished successfully.'
     else
       print_failed_tests
     end
@@ -81,13 +68,12 @@ def check_list_of_links
   end
 
   # Collects any external URLs found in a directory of files. Also collectes
-  # every failed test from check_files_for_internal_woes.
+  # every failed test from process_files.
   # Sends the external URLs to Typhoeus for batch processing.
-  def check_directory_of_files
+  def check_files_in_directory
     @external_urls = {}
-    results = check_files_for_internal_woes
 
-    results.each do |item|
+    process_files.each do |item|
       @external_urls.merge!(item[:external_urls])
       @failed_tests.concat(item[:failed_tests])
     end
@@ -101,49 +87,45 @@ def check_directory_of_files
     elsif !@options[:disable_external]
       validate_urls
     end
-
-    count = files.length
-    file_text = pluralize(count, 'file', 'files')
-    logger.log :info, :blue, "Ran on #{file_text}!\n\n"
   end
 
   # Walks over each implemented check and runs them on the files, in parallel.
-  def check_files_for_internal_woes
-    Parallel.map(files, @parallel_opts) do |path|
-      html = create_nokogiri(path)
+  def process_files
+    Parallel.map(files, @options[:parallel]) do |path|
       result = { :external_urls => {}, :failed_tests => [] }
+      html = create_nokogiri(path)
 
       checks.each do |klass|
-        logger.log :debug, :yellow, "Checking #{klass.to_s.downcase} on #{path} ..."
-        check = Object.const_get(klass).new(@src, path, html, @options, @typhoeus_opts, @hydra_opts, @parallel_opts, @validation_opts)
+        @logger.log :debug, "Checking #{klass.to_s.downcase} on #{path} ..."
+        check = Object.const_get(klass).new(@src, path, html, @options)
         check.run
         result[:external_urls].merge!(check.external_urls)
-        result[:failed_tests].concat(check.issues) if check.issues.length > 0
+        result[:failed_tests].concat(check.issues)
       end
       result
     end
   end
 
   def validate_urls
-    url_validator = HTMLProofer::UrlValidator.new(logger, @external_urls, @options, @typhoeus_opts, @hydra_opts)
+    url_validator = HTMLProofer::UrlValidator.new(@logger, @external_urls, @options)
     @failed_tests.concat(url_validator.run)
-    @iterable_external_urls = url_validator.iterable_external_urls
+    @external_urls = url_validator.external_urls
   end
 
   def files
-    if File.directory? @src
-      pattern = File.join(@src, '**', "*#{@options[:ext]}")
-      files = Dir.glob(pattern).select { |fn| File.file? fn }
-      files.reject { |f| ignore_file?(f) }
-    elsif File.extname(@src) == @options[:ext]
-      [@src].reject { |f| ignore_file?(f) }
-    else
-      []
-    end
+    @files ||= if File.directory? @src
+                 pattern = File.join(@src, '**', "*#{@options[:ext]}")
+                 files = Dir.glob(pattern).select { |fn| File.file? fn }
+                 files.reject { |f| ignore_file?(f) }
+               elsif File.extname(@src) == @options[:ext]
+                 [@src].reject { |f| ignore_file?(f) }
+               else
+                 []
+               end
   end
 
   def ignore_file?(file)
-    options[:file_ignore].each do |pattern|
+    @options[:file_ignore].each do |pattern|
       return true if pattern.is_a?(String) && pattern == file
       return true if pattern.is_a?(Regexp) && pattern =~ file
     end
@@ -153,28 +135,26 @@ def ignore_file?(file)
 
   def checks
     return @checks unless @checks.nil?
-    @checks = HTMLProofer::CheckRunner.checks.map(&:name)
+    @checks = HTMLProofer::Check.subchecks.map(&:name)
     @checks.delete('FaviconCheck') unless @options[:check_favicon]
     @checks.delete('HtmlCheck') unless @options[:check_html]
-    @options[:checks_to_ignore].each do |ignored|
-      @checks.delete(ignored)
-    end
+    @options[:checks_to_ignore].each { |ignored| @checks.delete(ignored) }
     @checks
   end
 
   def failed_tests
-    return [] if @failed_tests.empty?
     result = []
+    return result if @failed_tests.empty?
     @failed_tests.each { |f| result << f.to_s }
     result
   end
 
   def print_failed_tests
-    sorted_failures = HTMLProofer::CheckRunner::SortedIssues.new(@failed_tests, @options[:error_sort], logger)
+    sorted_failures = SortedIssues.new(@failed_tests, @options[:error_sort], @logger)
 
     sorted_failures.sort_and_report
     count = @failed_tests.length
     failure_text = pluralize(count, 'failure', 'failures')
-    fail logger.colorize :red, "HTML-Proofer found #{failure_text}!"
+    fail @logger.colorize :red, "HTML-Proofer found #{failure_text}!"
   end
 end
diff --git a/lib/html-proofer/cache.rb b/lib/html-proofer/cache.rb
@@ -11,7 +11,7 @@ class Cache
 
     FILENAME = File.join(STORAGE_DIR, 'cache.log')
 
-    attr_accessor :exists, :load, :cache_log, :cache_time
+    attr_reader :exists, :load, :cache_log
 
     def initialize(logger, options)
       @logger = logger
@@ -21,7 +21,7 @@ def initialize(logger, options)
         @load = false
       else
         @load = true
-        @parsed_timeframe = parsed_timeframe(options[:timeframe] || '30d')
+        @parsed_timeframe = parsed_timeframe(options[:timeframe])
       end
       @cache_time = Time.now
 
@@ -42,6 +42,10 @@ def urls
       @cache_log['urls'] || []
     end
 
+    def size
+      @cache_log.length
+    end
+
     def parsed_timeframe(timeframe)
       time, date = timeframe.match(/(\d+)(\D)/).captures
       time = time.to_f
@@ -80,21 +84,21 @@ def detect_url_changes(found)
         if existing_urls.include?(url)
           true
         else
-          @logger.log :debug, :yellow, "Adding #{url} to cache check"
+          @logger.log :debug, "Adding #{url} to cache check"
           false
         end
       end
 
       new_link_count = additions.length
       new_link_text = pluralize(new_link_count, 'link', 'links')
-      @logger.log :info, :blue, "Adding #{new_link_text} to the cache..."
+      @logger.log :info, "Adding #{new_link_text} to the cache..."
 
       # remove from cache URLs that no longer exist
       del = 0
       @cache_log.delete_if do |url, _|
         url = clean_url(url)
         if !found_urls.include?(url)
-          @logger.log :debug, :yellow, "Removing #{url} from cache check"
+          @logger.log :debug, "Removing #{url} from cache check"
           del += 1
           true
         else
@@ -103,7 +107,7 @@ def detect_url_changes(found)
       end
 
       del_link_text = pluralize(del, 'link', 'links')
-      @logger.log :info, :blue, "Removing #{del_link_text} from the cache..."
+      @logger.log :info, "Removing #{del_link_text} from the cache..."
 
       additions
     end
@@ -116,6 +120,18 @@ def load?
       @load.nil?
     end
 
+    def retrieve_urls(external_urls)
+      urls_to_check = detect_url_changes(external_urls)
+      @cache_log.each_pair do |url, cache|
+        if within_timeframe?(cache['time'])
+          next if cache['message'].empty? # these were successes to skip
+          urls_to_check[url] = cache['filenames'] # these are failures to retry
+        else
+          urls_to_check[url] = cache['filenames'] # pass or fail, recheck expired links
+        end
+      end
+      urls_to_check
+    end
 
     # FIXME: there seems to be some discrepenacy where Typhoeus occasionally adds
     # a trailing slash to URL strings, which causes issues with the cache