This python script scrapes all the license files and automates the task of detecting broken links, timeout error and other link issues
- Pre-requisite
- Installation
- Usage
- Integrating with CI
- Unit Testing
- Troubleshooting
- Code of Conduct
- Contributing
- License
- Python3
- UTF-8 supported console
There are two suggested ways of installation. Use User, if you are interested in just running the script. Use Development, if you are interested in developing the script
- Clone the repo
git clone https://github.com/creativecommons/cc-link-checker.git
- Install dependencies
Using Pipfile (requires pipenv):
pipenv install
We recommend using pipenv to create a virtual environment and install dependencies
- Clone the repo
git clone https://github.com/creativecommons/cc-link-checker.git
- Create virtual environment and install all dependencies
- Normal
pipenv install --dev
- Use
sync
to install last successful environment. For example:pipenv sync --dev
- Normal
- Run the script:
pipenv run link_checker
pipenv run link_checker -h
usage: link_checker [-h] {deeds,legalcode,rdf,index,combined,canonical} ...
Check for broken links in Creative Commons license deeds, legalcode, and rdf
optional arguments:
-h, --help show this help message and exit
subcommands (a single subcomamnd is required):
{deeds,legalcode,rdf,index,combined,canonical}
deeds check the links for each license's deed
legalcode check the links for each license's legalcode
rdf check the links for each license's RDF
index check the links within index.rdf
combined Combined check (deeds, legalcode, rdf, and index)
canonical print canonical license URLs
Also see the help output each subcommand
pipenv run link_checker deeds -h
usage: link_checker deeds [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker legalcode -h
usage: link_checker legalcode [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker rdf -h
usage: link_checker rdf [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--local-index] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker index -h
usage: link_checker index [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local-index] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker combined -h
usage: link_checker combined [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--local-index]
[--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker canonical -h
usage: link_checker canonical [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--include-gnu]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine valid
license paths (uses LICENSE_LOCAL_PATH environment variable
and falls back to default:
'../creativecommons.org/docroot/legalcode')
--include-gnu include GNU licenses in addition to Creative Commons
licenses
Due to the script capability to scrape licenses from local storage, it can be used as CI in 2 easy steps:
-
Clone this repo in the CI container
git clone https://github.com/creativecommons/cc-link-checker.git ~/cc-link-checker
-
Run the
link_checker.py
in local(--local
) and output error(--output-error
) modepython link_checker.py --local --output-errors
The configuration for GitHub Actions, for example, is present here.
Unit tests have been written using pytest framework. The tests can be run using:
- Install dev dependencies
- macOS with Homebrew
pipenv install --dev --python /usr/local/opt/python@3.7/libexec/bin/python
- General
pipenv install --dev
- macOS with Homebrew
- Run unit tests
pipenv run pytest -v
- Python Guidelines — Creative Commons Open Source
- Black: the uncompromising Python code formatter
- flake8: a python tool that glues together pep8, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
- isort: A Python utility / library to sort imports.
-
UnicodeEncodeError
:This error is thrown when the console is not UTF-8 supported.
-
Failing Lint build:
Ensure style/syntax is correct:
pipenv run black .
pipenv run isort .
pipenv run flake8 .
The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.
We welcome contributions for bug fixes, enhancement and documentation. Please see CONTRIBUTING.md
while contributing..