Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor Link Checking Only Checks One Link Per Thread #513

Open
harding opened this issue May 17, 2014 · 0 comments
Open

Anchor Link Checking Only Checks One Link Per Thread #513

harding opened this issue May 17, 2014 · 0 comments

Comments

@harding
Copy link

harding commented May 17, 2014

Anchor link checking only checks one link per thread. All other anchors checked by a particular thread report the same error or non-error as the first link checked. For example:

## Anchor checking is the only non-default setting
> cat ~/.linkchecker/linkcheckerrc 
[AnchorCheck]

## An example destination page with an anchor
> lynx -source http://localhost/destination
<html><head></head><body>
<span id="working">working anchor</span>
</body></html>

## An example origin page with two links, both pointing to the page
## above---but only one of them working
> lynx -source http://localhost/origin
<html><head></head><body>
<a href="/destination#broken">broken link</a>
<a href="/destination#working">working link</a>
</body></html>

## The first link gets checked and is correctly reported as broken,
## but the second link gets incorrectly reported as broken.
> linkchecker -t -1 http://localhost/origin | egrep '^(URL  |Warning)'
URL        `/destination#broken'
Warning    [None] Anchor `broken' not found. Available anchors:
URL        `/destination#working'
Warning    [None] Anchor `broken' not found. Available anchors:

## Reversed the order of links in the origin
> lynx -source http://localhost/origin
<html><head></head><body>
<a href="/destination#working">working link</a>
<a href="/destination#broken">broken link</a>
</body></html>

## In this case, the first link gets checked and is correctly reported
## as working, but the second link gets incorrectly reported as also
## working
> linkchecker -t -1 http://localhost/origin | egrep '^(URL  |Warning)'
[no output]

(Tests with multiple threads are consistent with above output, although the actual output on a large site varies dramatically depending on whether a particular thread sees a broken link or working link first.)

According to git blame the checkanchor plugin was introduced in 7b34be5. Running the tests above on that version does not produce the correct output, but the output is subtly different. The second test (working link first) produces identical non-warning, but the first test (broken link first) produces only one warning instead of two warnings.

Note: python -m pytest tests/checker/test_anchor.py passes because that test only checks one URL.

System: Linux (Debian Unstable), Python 2.7.6, glibc 2.18 amd64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant