Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] Use linkchecker to verify URLs #79

Closed
wants to merge 15 commits into from

Conversation

yarikoptic
Copy link
Collaborator

@yarikoptic yarikoptic commented Oct 31, 2018

More TODOs outside the scope of this PR:

@yarikoptic
Copy link
Collaborator Author

hm, did I screw up circle-ci or it is not enabled at all? I see only Travis build, which doesn't run mkdocs, so I was enhancing circle-ci configuration

@chrisgorgo
Copy link
Contributor

I eneabled it on forks, push something new, maybe it will trigger.

@yarikoptic
Copy link
Collaborator Author

ok, linkchecker quick and dirty fixes proposed now and fresh finding now is (edits from maintainers are allowed, so welcome to push the proper fix)

URL        `#heading=h.5u721tt1h9pe'
Name       `8.3.1. Common MR metadata fields'
Parent URL file:///home/yoh/proj/bids/bids-specification/site/04-modality-specific-files/01-magnetic-resonance-imaging-data.html, line 1300, col 8
Real URL   file:///home/yoh/proj/bids/bids-specification/site/04-modality-specific-files/01-magnetic-resonance-imaging-data.html
Check time 0.280 seconds
D/L time   0.001 seconds
Size       65.02KB
Modified   2018-10-31 05:58:49.047498Z
Warning    [None] Anchor `heading=h.5u721tt1h9pe' not found.
           Available anchors: `__drawer', `__search', `__toc',
           `a-real-fieldmap-image', `anatomical-landmarks',
           `anatomy-imaging-data',
           `case-4-multiple-phase-encoded-directions-pepolar',
           `common-metadata-fields', `diffusion-imaging-data',
           `fieldmap-data', `fmri-task-information',
           `in-plane-spatial-encoding',
           `institution-information',
           `magnetic-resonance-imaging-data', `nav-1', `nav-1-4',
           `other-recommended-metadata',
           `phase-difference-image-and-at-least-one-magnitude-image',
           `required-fields', `rf-contrast', `scanner-hardware',
           `sequence-specifics', `slice-acceleration',
           `task-including-resting-state-imaging-data',
           `timing-parameters', `timing-parameters_1',
           `two-phase-images-and-two-magnitude-images'.
Result     Valid

Statistics:
Downloaded: 3.40MB.
Content types: 5 image, 96 text, 0 video, 0 audio, 55 application, 0 mail and 118 other.
URL lengths: min=8, max=130, avg=72.

That's it. 274 links in 274 URLs checked. 1 warning found. 0 errors found.
Stopped checking at 2018-10-31 01:59:01-004 (2 seconds)

ATM it reveals way too many problems to deal at once, e.g.:

URL        'http://www.cognitiveatlas.org/term/id/trm_54e69c642d89b'
Name       'http://www.cognitiveatlas.org/term/id/trm_54e69c642d89b'
Parent URL file:///home/yoh/proj/bids/bids-specification/site/04-modality-specific-files/02-magnetoencephalography.html, line 639, col 184
Real URL   http://www.cognitiveatlas.org/term/id/trm_54e69c642d89b
Check time 1.214 seconds
Result     Error: 404 Not Found
I am not sure if there is any other than this config file way - could not find
@yarikoptic
Copy link
Collaborator Author

anyone has a clue what is up with circle CI? I hoped for current run to depict the correctly detected anchor but got IMHO unrelated to my changes

Virtualenv location: /home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q
Pipfile.lock (b67896) out of date, updating to (ab1abf)...
Locking [dev-packages] dependencies...
Locking [packages] dependencies...
env/utils.py", line 402, in resolve_deps
    req_dir=req_dir
  File "/usr/local/lib/python3.6/site-packages/pipenv/utils.py", line 250, in actually_resolve_deps
    req = Requirement.from_line(dep)
  File "/usr/local/lib/python3.6/site-packages/pipenv/vendor/requirementslib/models/requirements.py", line 704, in from_line
    line, extras = _strip_extras(line)
TypeError: 'module' object is not callable

Exited with code 1

now I will push the fix for the detected wrong anchor

@chrisgorgo
Copy link
Contributor

Not sure what is going on. This error should not have affected pip 18.0 and master is not failing. Try committing updated Pipenv.lock

@yarikoptic
Copy link
Collaborator Author

time will come when I loose my superpowers of breaking things:

$> pipenv lock 
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Traceback (most recent call last):
  File "/usr/bin/pipenv", line 11, in <module>
    load_entry_point('pipenv==11.9.0', 'console_scripts', 'pipenv')()
  File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/pipenv/vendor/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pipenv/cli.py", line 512, in lock
    verbose=verbose, clear=clear, pre=pre, keep_outdated=keep_outdated
  File "/usr/lib/python3/dist-packages/pipenv/core.py", line 1140, in do_lock
    vcs_deps = convert_deps_to_pip(project.vcs_packages, project, r=False)
  File "/usr/lib/python3/dist-packages/pipenv/utils.py", line 672, in convert_deps_to_pip
    extra = '{0}+{1}'.format(vcs, deps[dep][vcs])
TypeError: string indices must be integers

mild wild guess -- probably due to my introduced git+https url

@yarikoptic
Copy link
Collaborator Author

I almost won! but damn things doesn't give up ;)

ImportError: /home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python3.6/site-packages/linkcheck/HtmlParser/htmlsax.cpython-36m-x86_64-linux-gnu.so: undefined symbol: PyString_FromStringAndSize

;-)

@yarikoptic
Copy link
Collaborator Author

oh, python3 is not supported by linkchecker yet... overall it is starting to look more ugly than more beautiful

@sappelhoff
Copy link
Member

I think that this PR would be a valuable addition. What is holding you off from trying to make it work @yarikoptic ?

@yarikoptic
Copy link
Collaborator Author

hiccup:

singledispatch was not installed
#!/bin/bash -eo pipefail
pipenv run mkdocs build --clean --strict --verbose
DEBUG   -  Loading configuration file: /home/circleci/project/mkdocs.yml 
Traceback (most recent call last):
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/bin/mkdocs", line 11, in <module>
    sys.exit(cli())
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/__main__.py", line 162, in build_command
    site_dir=site_dir
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/base.py", line 197, in load_config
    errors, warnings = cfg.validate()
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/base.py", line 107, in validate
    run_failed, run_warnings = self._validate()
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/base.py", line 62, in _validate
    self[key] = config_option.validate(value)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/config_options.py", line 132, in validate
    return self.run_validation(value)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/config_options.py", line 572, in run_validation
    plgins[item] = self.load_plugin(item, cfg)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/mkdocs/config/config_options.py", line 580, in load_plugin
    Plugin = self.installed_plugins[name].load()
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2345, in load
    self.require(*args, **kwargs)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2368, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/home/circleci/.local/share/virtualenvs/project-zxI9dQ-Q/lib/python2.7/site-packages/pkg_resources/__init__.py", line 784, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'singledispatch' distribution was not found and is required by tornado
Exited with code 1

probably could be easily mitigated by adjusting pipenv setup to include it explicitly (for py2 only). But I already forgot how to use that pipenv beast... help would be welcome! ;)

@sappelhoff
Copy link
Member

Okay thanks for the summary. Not sure whether linkchecker will support python 3 anytime soon though --> linkchecker/linkchecker#40

@yarikoptic
Copy link
Collaborator Author

yeap, but bids-standard afaik can be built under python2 right?
moreover linkchecker is just a tool, so in principle could be installed in an independent environment. I was just trying to piggyback on the existing one, but it is not strictly required

@sappelhoff
Copy link
Member

yeap, but bids-standard afaik can be built under python2 right?

I'd rather keep it under Python3, but that might be my irrational dislike for python2.

in principle could be installed in an independent environment

right, ... we could perhaps have a second "environment" or "workflow", however one would call it.
I think that using travis for this may be a bit more straight forward, because there is not so much going in our travis flow (compared to the busy circleci config.yml)

@yarikoptic
Copy link
Collaborator Author

although true - it would entail setting up the full website building env on travis too then.
Anyways - it would "work for me", just that I would not be able to take a crack at it soon

@sappelhoff
Copy link
Member

although true - it would entail setting up the full website building env on travis too then.

ah, I had hoped that the linkchecker could work on the source data (markdown). Okay, then circle ci might work just as well in a separate py2 workflow.

it would "work for me", just that I would not be able to take a crack at it soon

okay. It'll have to wait for some more then.

@sappelhoff sappelhoff changed the title Use linkchecker to verify that URLs are Ok [INFRA] Use linkchecker to verify URLs Apr 12, 2019
@yarikoptic
Copy link
Collaborator Author

Small update: There is an ongoing active effort to make linkchecker py3 compatible (linkchecker/linkchecker#210) so I expect to come back to this one as soon as there is a version to try.

@yarikoptic
Copy link
Collaborator Author

FWIW note -- not yet, e.g. here is the most recent encounter linkchecker/linkchecker#230 (comment)

@sappelhoff
Copy link
Member

is this MkDocs plugin perhaps already sufficient? --> https://github.com/manuzhang/mkdocs-htmlproofer-plugin

@yarikoptic
Copy link
Collaborator Author

who knows? might as well be and shouldn't hurt to be enabled regardless if it makes build less buggy!

@sappelhoff
Copy link
Member

who knows? might as well be and shouldn't hurt to be enabled regardless if it makes build less buggy!

I am just surprised that the plugin has a class with three methods (less than 100lines of code) ... whereas the linkchecker is a huge software project ...

but both should serve the same purpose (identify urls / links that lead to bad pages). Makes me suspicious 🤔

@sappelhoff
Copy link
Member

sappelhoff commented Aug 1, 2019

I just stumbled over this: https://github.com/davidtheclark/remark-lint-no-dead-urls

it would probably be easy to implement, because we are using remark already:

language: node_js
node_js:
- "10"
cache:
directories:
- node_modules # NPM packages
before_script:
- npm install remark-cli@5.0.0 remark-lint@6.0.2 remark-preset-lint-recommended@3.0.2 remark-preset-lint-markdown-style-guide@2.1.2
script:
- remark src/*.md src/*/*.md --frail

{
"plugins": [
"preset-lint-markdown-style-guide",
["lint-no-duplicate-headings", false],
["lint-list-item-indent", "tab-size"],
["lint-emphasis-marker", "consistent"],
["lint-maximum-line-length", false]
]
}

@yarikoptic
Copy link
Collaborator Author

Shouldn't hurt but seems to only care about external URLs. I wanted to check all the #id ones since they are prone to break. Since python 3 ification is going too slow I will try to find time to provide singularity based run

@yarikoptic
Copy link
Collaborator Author

I have worked out locally a docker (since should already be present on travis/circle) based recipe only to realize that circle-ci environments are already docker environments! d'oh! Learning how to chain the jobs now.... Replacement PR might come shortly or I will comment on here that I give up again ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants