Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching breaks proofing of internal links #640

Closed
CarterPape opened this issue Apr 27, 2021 · 7 comments
Closed

Caching breaks proofing of internal links #640

CarterPape opened this issue Apr 27, 2021 · 7 comments
Labels

Comments

@CarterPape
Copy link

CarterPape commented Apr 27, 2021

The output I'm getting

For SEO purposes, the main error output is:

htmlproofer 3.19.1 | Error:  undefined method `first' for nil:NilClass
/usr/local/lib/ruby/gems/3.0.0/gems/html-proofer-3.19.1/lib/html-proofer/runner.rb:155:in `block in validate_internal_urls': undefined method `first' for nil:NilClass (NoMethodError)

The command I'm running:

htmlproofer _site --internal_domains moab.news --assume_extension --enforce_https --check_favicon --check_html --check_opengraph --check_img_http --timeframe 2d

The contents of my tmp/.htmlproofer/cache.log

I have not been able to isolate the error to a particular link or set of links. I believe that I did not get this error when I ran the script within the specified timeframe but I did after. (As in: I ran the command a bunch on the first day then started getting errors on the third or fourth day after the first run.)

Possible steps to reproduce

  1. git clone https://github.com/CarterPape/moab.news.git (the project from which I'm working. It has a few large files, so you can just download this .zip of it instead.)
    1. When prompted for git-lfs-related password (if cloneing), provide GitHub username and a token that solely grants read-only access to public repositories. Revoke the token later if you want; I'm not saving it. This part is just a WIP.
  2. bundle install
  3. ./scripts/moab.news test
  4. edit the time attribute on any cached internal link with a query parameter (e.g. "/assets/favicons/apple-touch-icon-114x114.png?v=2020-12-10-1") in tmp/.htmlproofer/cache.log to be more than two days ago (e.g. "time":"2021-01-01 17:14:27 -0600"). See note below for shortcut.
  5. ./scripts/moab.news test again

As I mentioned in a comment, I have narrowed the problem down to being with cached internal links that contain query parameters (i.e. ends with ?something=something, emphasis on ?).

So, at step 4 above, you could just run this command to do what I described:

sed -i \
    's;\("/[^"]\+?v=[^"]\+"\):{"time":"[[:digit:]: -]\+";\1:{"time":"2021-01-01 12:00:00 -0600";' \ 
    tmp/.htmlproofer/cache.log

Note that, if you're on MacOS, sed is not gnu-sed, and I don't have the patience to figure out how they're different. I use gsed from Homebrew.

@CarterPape CarterPape changed the title Caching breaks proofing of internal links in particular circumstance Caching breaks proofing of internal links Apr 27, 2021
@CarterPape
Copy link
Author

CarterPape commented Apr 27, 2021

browser hit "enter" before I was ready; I'm still editing…

Edit: done

@gjtorikian
Copy link
Owner

I haven't been able to reproduce this. Is it still occurring for you?

@CarterPape
Copy link
Author

CarterPape commented May 5, 2021

Yes. For now, I am using a workaround to remove all the internal links from the cache before testing. I inserted a step (number 4) that should help in reproducing. I am still investigating myself why this is affecting internal links, specifically.

I suspected that it might happen when I change the slug on a post, but I don't think that's it. I am digging now and will post an update if I get one.

Edits:

I moved the script file, so I fixed that path in the steps to reproduce. I tested this again on my local machine and got the same error. I am thinking through what other machine from which I can test to isolate anything environment-specific, if that's the problem.

@CarterPape
Copy link
Author

CarterPape commented May 6, 2021

I believe I have narrowed this down to the error being with the internal links that include query parameters.

So, for example, when I remove all internal links except the following from cache.log, I get the error (whitespace formatted for ease of reading):

{
    "/assets/favicons/apple-touch-icon-114x114.png?v=2020-12-10-1": {
        "time": "2021-01-01 12:00:22 -0600",
        "filenames": [
            a,
            bunch,
            of,
            paths,
        ],
        "status": 200,
        "message": ""
        }
}

Note the time is before the cache cutoff, so the link gets picked up as one that needs to be verified again.

When I remove that link or change the date to be after the cache cutoff (i.e. still valid in the cache), the error goes away. Doing this for the other internal links that have no query string does not produce the error.

@gjtorikian
Copy link
Owner

I still can't reproduce this. Here's my test case. Basically, I:

  • Have a file internal_query_link.html
  • Which internally links to ./gpl.png?v=2020-12-10-1
  • I set the date to some time in the past (2020, 0o1, 27, 12, 0, 0)
  • The link gets added to the cache as a 200
  • I set the date to the future (2021, 0o6, 20, 12, 0, 0)
  • And the link gets re-added (because the cache date is out of range)

Everything works as expected. What am I missing?

@CarterPape
Copy link
Author

Hmm… I will investigate further and let you know. Thanks for the update.

@gjtorikian
Copy link
Owner

The next version of HTMLProofer has completely rewritten the cache and I don't believe this is a problem any more. I am going to close this issue out, as I'm just one person and can't maintain both the 3.x and 4.x features. ✌️ 4.x should release within the month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants