Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting broken links when using relative internal links reported as broken #792

Closed
bnorberg opened this issue Feb 25, 2023 · 11 comments
Closed

Comments

@bnorberg
Copy link

Hi, I am running into an issue where HTML Proofer is reporting working relative links as broken.

Example link

<a href="../admin/configuring-load-balancer-endpoints.html">

Example feedback from HTML Proofer

Screenshot 2023-02-25 at 11 05 53 AM

Any suggestions on what could be causing the broken link message?

My only solution to ensure I don't get false positives on relative links is to add '../' links in the ignore-urls option or changing all links to root relative. Neither solution is ideal.

@gjtorikian
Copy link
Owner

Hm, I can't reproduce this issue. Are you sure configuring-load-balancer-endpoints.html exists? Could you zip up your output for me to examine it, or is there some other way I can reproduce it?

@bnorberg
Copy link
Author

Thanks, for the quick response, gjtorikian.

That page does exist because the link works on the deploy site. Here's the page in the gh-pages branch of the deploy, https://github.com/NetAppDocs/storagegrid-116/blob/gh-pages/admin/configuring-load-balancer-endpoints.html and here is the page that is reference the above link through the relative https://github.com/NetAppDocs/storagegrid-116/blob/gh-pages/admin/configuring-custom-server-certificate-for-storage-node-or-clb.html. Again this is just one example. We are getting the broken link on all .. relative links. If we change <a href="../admin/configuring-load-balancer-endpoints.html"> to <a href="./admin/configuring-load-balancer-endpoints.html"> or <a href="/admin/configuring-load-balancer-endpoints.html">, the link checker will not return a broken link message.

We are using a mix of private and public GitHub build runners and different deployment environments for site builds so it will probably not be possible to replicate what I am seeing.

Is there a possibility that this could be related to a site prefix causing the .. in the relative link to not actually be in the correct directory to find the linked file?? We use a language prefix on our site.

Thanks again.

@gjtorikian
Copy link
Owner

Sorry, but this is probably going to be a bit of back and forth. 😓

There are already a few tests that check to make sure relative (../) links works; I've just pushed a new test up that tries to mimic your issue and it, too, passes: bd89111

But as I said there are many tests for this. One thing I noticed is that the GitHub page you link to for configuring-load-balancer-endpoints.html has 0 links for ../admin/configuring-load-balancer-endpoints.html, but it does have links for /us-en/storagegrid-116/admin/configuring-load-balancer-endpoints.html. Is this expected?

@riccardoporreca
Copy link
Collaborator

@gjtorikian, are the tests also covering the case of running on a directory instead of an individual file?

By looking at

path_dot_ext = ""
path_dot_ext = path + @runner.options[:assume_extension] unless blank?(@runner.options[:assume_extension])
base = if absolute_path?(path) # path relative to root
# either overwrite with root_dir; or, if source is directory, use that; or, just get the current file's dirname
@runner.options[:root_dir] || (File.directory?(@runner.current_source) ? @runner.current_source : File.dirname(@runner.current_source))
# relative links, path is a file
elsif File.exist?(File.expand_path(
path,
@runner.current_source,
)) || File.exist?(File.expand_path(path_dot_ext, @runner.current_source))
File.dirname(@runner.current_filename)
# relative links in nested dir, path is a file
elsif File.exist?(File.join(
File.dirname(@runner.current_filename),
path,
)) || File.exist?(File.join(File.dirname(@runner.current_filename), path_dot_ext))
File.dirname(@runner.current_filename)
# relative link, path is a directory
else
@runner.current_filename
end
file = File.join(base, path)

It seems the logic could be different in the two cases since for a "file" check I think @runner.current_source is the same as @runner.current_filename.

Looking at the code, there is perhaps an issue with the following lines, where it would seem more reasonable to use File.dirname(@runner.current_filename) instead of @runner.current_source in File.expand_path :

elsif File.exist?(File.expand_path(
path,
@runner.current_source,
)) || File.exist?(File.expand_path(path_dot_ext, @runner.current_source))
File.dirname(@runner.current_filename)

@bnorberg
Copy link
Author

bnorberg commented Mar 2, 2023

No worries, @gjtorikian. I totally understand and really appreciate @riccardoporreca's and your help.

We are running the check on a directory. And we do add a prefix during the build process to the the root url of the page for the language code (ie, us-en) and the name of the repository. That is why you are seeing root url like /us-en/storagegrid-116 and not ../admin/configuring-load-balancer-endpoints.html.

I am so sorry. I probably confused you with a bad relative link example on the previous post. This would be a relative link example for the configuring-load-balancer-endpoints.html page that is not working , <a href="../admin/web-browser-requirements.html">.

Maybe this prefix is the problem. When a writer uses a relative link to another file in that repository (ie, ../admin/web-browser-requirements.html") the prefix might be throwing off where in the directory the check is actually running. And maybe @riccardoporreca suggestion can fix this?

Anyway, thanks again for your help.

@riccardoporreca
Copy link
Collaborator

@bnorberg, I did a quick test myself and the "directory" check seem to support correctly "../" links.

Any chance you can share even a small part of the actual website you run on and see the failures, which as I understand is not the public repo you are pointing to? On the public repo, html-proofer has no issue with ../admin/web-browser-requirements.html (I ran on a local checkout and there is no error on that link). Also important is sharing the options you call html-proofer with, as well as the version you are using. It is hard to investigate without a way of reproducing the error.

@bnorberg
Copy link
Author

bnorberg commented Mar 6, 2023

@riccardoporreca @gjtorikian thanks again for testing and confirming all's working for you.

We have Ruby 2.6 in our build environment and so are using HTML Proofer version 4.4.3. Here is a gist of our configuration, https://gist.github.com/bnorberg/f22875beb1c5fc4cdee7f7db91f3ee72. You'll see that right now we add ../ links to the ignore-urls options to avoid getting failure reports. We wrap HTML Proofer into a homegrown application and run it as a utility in a component step of a GitHub Actions workflow.

Unfortunately, I cannot share any content from the content repository in which we are seeing the failures because it is pre-public release. But the repo I have shared already, https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages, is very similar content-wise. We build our public sites using GitHub free Ubuntu build runners and GitHub pages. For our internal content, we have private build runners (built identical to Ubuntu GitHub build runners), that deploy to VM behind a firewall.

I will do some testing on our public content with relative links turned on tomorrow to see if this might be a problem limited to our internal environment setup and report back. Thanks again.

@riccardoporreca
Copy link
Collaborator

riccardoporreca commented Mar 6, 2023

Thanks @bnorberg, going in the right direction.

Are you able to reproduce the failure on https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages with a similar configuration, of course without ignoring ../?

If you can provide us with another gist, which you could use at your end to reproduce the issue on https://github.com/NetAppDocs/storagegrid-116/tree/gh-pages, we then have a starting point to reproduce at our end and investigate. Fort this exercise, you may want to specify disable_external: true and run with only checks: ['Links'] to speed things up.

While you do that, it would be great if you could check the error also occurs with the latest v5 html-proofer.

@riccardoporreca
Copy link
Collaborator

@gjtorikian, @bnorberg, I had an intuition by looking at the way the error is reported in the issue's description at the very top: html-proofer is actually looking for a link without leading ../, paired with my special interest for the html-proofer caching.

With this in mind, I could reproduce the issue even with the same test data introduced in bd89111 by just enabling the cache.

The issue is with cleaned_url, which was introduced in #686 consistently for the external URLs

def retrieve_urls(urls_detected, type)
# if there are no urls, bail
return {} if urls_detected.empty?
urls_detected = urls_detected.transform_keys do |url|
cleaned_url(url)
end
urls_to_check = detect_url_changes(urls_detected, type)
urls_to_check
end

however introducing an inconsistency for the internal URLs, where cleaning is not done when adding to the cache, and should not be done as it seems to strip leading ../.

I have extended the test and am ready for a simple fix

@gjtorikian
Copy link
Owner

Amazing sleuthing @riccardoporreca, thank you so much.

Released as 5.0.5.

@bnorberg
Copy link
Author

bnorberg commented Mar 7, 2023

@gjtorikian, @riccardoporreca I created a branch with the latest html proofer release and can confirm that this change does solve our relative link error issue. Thank you so much for your effort and amazing response time. You gentlemen are awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants