Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a linkchecker #356

Closed
projektir opened this issue Jun 24, 2017 · 11 comments
Closed

Add a linkchecker #356

projektir opened this issue Jun 24, 2017 · 11 comments
Labels
A-CLI Area: CLI A-HTML Area: HTML Rendering A-Links Area: Issues with links A-Tests Area: `mbdook test` related tests C-enhancement Category: Enhancement or feature request S-Wishlist Status: Wishlist
Milestone

Comments

@projektir
Copy link
Contributor

A linkchecker is a convenient tool to have for avoiding errors and keeping track of dead links. It is also a nice to have for #308.

The RBE linkchecker and the one used in rust-lang/rust both scan the files line-by-line while applying a regex to find links and check them. I don't know if it'd be OK for us to adopt the rust-lang/rust one? @steveklabnik

I think a good place for this would be mdbook test.

@Michael-F-Bryan
Copy link
Contributor

This would be a good candidate for the plugin system (#163). Ideally, after the rendering stage you'd be able to make a plugin which gets passed the rendered output's location and then checks all the links in all the *.html files it can find.

We're planning on refactoring the current system to make it a lot easier to write your own plugins and renderers.

@azerupi
Copy link
Contributor

azerupi commented Jun 24, 2017

I think a good place for this would be mdbook test

Definitely in mdbook test

This would be a good candidate for the plugin system

I agree, this could potentially be written as a plugin in the future :)
I emphasised "in the future" because I don't want to stall progress on changes that are coming soon-ish. We don't have a deadline for the plugin system, so if someone wants to contribute a solution right now, I wouldn't want to break their inertia.

However, we can keep this use case in the back of our minds when doing the refactorings, to make it indeed possible to implement this as a plugin later. :)

@azerupi azerupi added A-CLI Area: CLI A-HTML Area: HTML Rendering A-Tests Area: `mbdook test` related tests S-Wishlist Status: Wishlist C-enhancement Category: Enhancement or feature request labels Jun 24, 2017
@steveklabnik
Copy link
Member

steveklabnik commented Jun 24, 2017 via email

@budziq
Copy link
Contributor

budziq commented Jun 24, 2017

Definitely in mdbook test

It might be nice to have it as a warning also on mdbook build stage

I've thought about trying to put the rust-lang one on crates.io so others
could use it too, to be honest.

@steveklabnik That would be awesome!

@Michael-F-Bryan
Copy link
Contributor

Michael-F-Bryan commented Jan 13, 2018

For anyone interested, I've started playing around with a mdbook-linkcheck backend for checking links. You'll need to install mdbook directly from master and isn't 100% finished yet, but it may be useful for some people.

EDIT: It looks like the tool works, because I've already found my first batch of broken links, rust-lang/rust-by-example#990 🎉

Example Output

This is the output (when logging very verbosely) when the tool is run over mdbook's user: guide

$ RUST_LOG=mdbook_linkcheck cargo run -- -s ~/Documents/forks/mdBook/book-example
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/mdbook-linkcheck -s /home/michael/Documents/forks/mdBook/book-example`
 INFO 2018-01-13T13:38:49Z: mdbook_linkcheck: Checking for broken links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: Config {
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck:     follow_web_links: false
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: }
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck: Finding all links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://www.rust-lang.org" in README.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook" in README.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in README.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook/*/mdbook/" in README.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.mozilla.org/MPL/2.0/" in README.md#15
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://crates.io/crates/mdbook" in cli/cli-tool.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.rust-lang.org/" in cli/cli-tool.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.rust-lang.org/downloads.html" in cli/cli-tool.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://crates.io/" in cli/cli-tool.md#20
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook" in cli/cli-tool.md#27
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "format/summary.html" in cli/init.md#25
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in cli/watch.md#26
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in cli/serve.md#40
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://doc.rust-lang.org/stable/book/" in cli/test.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://handlebarsjs.com/" in format/theme/theme.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/index-hbs.md#90
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://highlightjs.org" in format/theme/syntax-highlighting.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/syntax-highlighting.md#56
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.mathjax.org/" in format/mathjax.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "" in format/rust.md#38
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook" in lib/index.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html" in lib/index.md#33
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://www.linfo.org/rule_of_silence.html" in lib/index.md#165
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/mdinger" in misc/contributors.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/kbknapp" in misc/contributors.md#8
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/steveklabnik" in misc/contributors.md#9
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/asolove" in misc/contributors.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/waynenilsen" in misc/contributors.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/funkill" in misc/contributors.md#12
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/FuGangqiang" in misc/contributors.md#13
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/Michael-F-Bryan" in misc/contributors.md#14
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/cspiegel" in misc/contributors.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck: Found 32 links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://www.rust-lang.org" in README.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://www.rust-lang.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook" in README.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in README.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook/*/mdbook/" in README.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook/*/mdbook/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.mozilla.org/MPL/2.0/" in README.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.mozilla.org/MPL/2.0/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://crates.io/crates/mdbook" in cli/cli-tool.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://crates.io/crates/mdbook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.rust-lang.org/" in cli/cli-tool.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.rust-lang.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.rust-lang.org/downloads.html" in cli/cli-tool.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.rust-lang.org/downloads.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://crates.io/" in cli/cli-tool.md#20
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://crates.io/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook" in cli/cli-tool.md#27
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "format/summary.html" in cli/init.md#25
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Searching for /home/michael/Documents/forks/mdBook/book-example/src/format/summary.md
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in cli/watch.md#26
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in cli/serve.md#40
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://doc.rust-lang.org/stable/book/" in cli/test.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://doc.rust-lang.org/stable/book/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://handlebarsjs.com/" in format/theme/theme.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://handlebarsjs.com/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/index-hbs.md#90
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://highlightjs.org" in format/theme/syntax-highlighting.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://highlightjs.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/syntax-highlighting.md#56
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.mathjax.org/" in format/mathjax.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.mathjax.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "" in format/rust.md#38
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: Error for "" in format/rust.md#38, The link is empty
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook" in lib/index.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html" in lib/index.md#33
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://www.linfo.org/rule_of_silence.html" in lib/index.md#165
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://www.linfo.org/rule_of_silence.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/mdinger" in misc/contributors.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/mdinger"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/kbknapp" in misc/contributors.md#8
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/kbknapp"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/steveklabnik" in misc/contributors.md#9
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/steveklabnik"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/asolove" in misc/contributors.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/asolove"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/waynenilsen" in misc/contributors.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/waynenilsen"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/funkill" in misc/contributors.md#12
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/funkill"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/FuGangqiang" in misc/contributors.md#13
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/FuGangqiang"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/Michael-F-Bryan" in misc/contributors.md#14
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/Michael-F-Bryan"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/cspiegel" in misc/contributors.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/cspiegel"
There were 1 broken links

format/rust.md#38: The link is empty

@projektir
Copy link
Contributor Author

@Michael-F-Bryan so rust-lang/rust already has a linkchecker, which is the one we originally wanted to pull out and turn into a crate (I'm not sure what that means for plugins). It has some problems, though, that yours doesn't have (for instance, this fix is really needed), but it also does some things yours doesn't (check for absolute paths).

Idk if we want to have these out of sync given that rust-lang/rust's linkchecker would run on every x.py build for all the books that it manages.

@Michael-F-Bryan
Copy link
Contributor

Idk if we want to have these out of sync given that rust-lang/rust's linkchecker would run on every x.py build for all the books that it manages.

My original hopes were that this could supplement (or even succeed?) their link checker, although on further inspection they do a lot of cross-site linking (i.e. using links to files outside the book such as ../../std/prelude/index.html). My linkchecker works purely with the source book and doesn't take into account the fact that other things exist on the Rust S3 bucket, so I don't know whether this is still possible.

That said, the entire idea behind enabling alternate backends is that people can write their own tools to suit their exact use case. For example, it was almost trivial to knock up a backend which runs everything through rust-skeptic, which is something Rust By Example currently need to do manually.

but it also does some things yours doesn't (check for absolute paths).

This part was tricky. I originally treated relative and absolute paths separately (relative links are relative to the chapter's directory, absolute is relative to src/) but found that most of the links in Rust By Example used a completely different convention. We use the <base> tag to tweak how links get resolved by your browser, so what I detected as a "broken link" turned out to still work fine when viewing online.

@Michael-F-Bryan
Copy link
Contributor

@ehuss, the mdbook-linkcheck backend already exists and does a pretty good job.

Are we happy with letting the ecosystem provide a linkchecker instead of making it part of mdbook?

@ehuss
Copy link
Contributor

ehuss commented May 21, 2020

I think that's up to you! It seems unlikely that anyone is going to develop a new link checker. If you're asking if you want to migrate mdbook-linkcheck as a built-in, I think that's also up to you. Or if you're asking if this issue should just be closed, I'm fine with that, too.

@Michael-F-Bryan
Copy link
Contributor

I think we can close the issue. The mdbook-linkcheck crate should fill this niche well enough.

@ehuss
Copy link
Contributor

ehuss commented May 26, 2020

Sounds reasonable!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-HTML Area: HTML Rendering A-Links Area: Issues with links A-Tests Area: `mbdook test` related tests C-enhancement Category: Enhancement or feature request S-Wishlist Status: Wishlist
Projects
None yet
Development

No branches or pull requests

6 participants