urlfix
aims to find all outdated URLs in a given file and fix them.
Features List
-
Commandline and programmer-friendly modes.
-
Replace outdated URLs/links in a single file
-
Replace outdated URLs/links in a directory
-
Replace outdated URLs/links in the same file or in the same files in a directory i.e. inplace.
-
Replace outdated links in files in nested directories
-
Replace outdated links in files in sub-nested directories
Supported file formats
urlfix
fixes URLs given a file of the following types:
-
MarkDown (.md)
-
Plain Text files (.txt)
-
RMarkdown (.rmd)
-
ReStructured Text (.rst)
-
PDF (.pdf)
-
Word (.docx)
-
ODF (.odf)
Installation
The simplest way to install the latest release is as follows:
pip install urlfix
To install the development version:
Open the Terminal/CMD/Git bash/shell and enter
pip install git+https://github.com/Nelson-Gon/urlfix.git
# or for the less stable dev version
pip install git+https://github.com/Nelson-Gon/urlfix.git@dev
Otherwise:
# clone the repo
git clone git@github.com:Nelson-Gon/urlfix.git
cd urlfix
pip install -e .
Sample usage
Script Mode
To use at the commandline, please use:
python -m urlfix --mode "f" --verbose 1 --inplace 1 --inpath myfile.md
If not replacing within the same file, then:
python -m urlfix --mode "f" --verbose 1 --inplace 0 --inpath myfile.md --output-file myoutputfile.md
To get help:
python -m urlfix -h
#usage: main.py [-h] -m MODE -in INPUT_FILE [-o OUTPUT_FILE] -v {False,false,0,True,true,1} -i {False,false,0,True,true,1}
#
#optional arguments:
# -h, --help show this help message and exit
# -m MODE, --mode MODE Mode to use. One of f for file or d for directory
# -in INPUT_FILE, --input-file INPUT_FILE
# Input file for which link updates are required.
# -o OUTPUT_FILE, --output-file OUTPUT_FILE
# Output file to write to. Optional, only necessary if not replacing inplace
# -v {False,false,0,True,true,1}, --verbose {False,false,0,True,true,1}
# String to control verbosity. Defaults to True.
# -i {False,false,0,True,true,1}, --inplace {False,false,0,True,true,1}
# Should links be replaced inplace? This should be safe but to be sure, test with an output file first.
Programmer-Friendly Mode
from urlfix.urlfix import URLFix
from urlfix.dirurlfix import DirURLFix
Create an object of class URLFix
urlfix_object = URLFix("testfiles/testurls.txt", output_file="replacement.txt")
Replacing URLs
After creating our object, we can replace outdated URLs as follows:
urlfix_object.replace_urls(verbose=1)
The above uses default arguments and will not replace a file inplace. This is a safety mechanism to ensure one does not damage their files.
Since we set verbose
to True
, we get the following output:
urlfix_object.replace_urls()
To replace silently, simply set verbose to False
(which is the default).
urlfix_object.replace_urls()
If there are URLs known to be valid, pass these to the correct_urls
argument to save some time.
urlfix_object.replace_urls(correct_urls=[urls_here]) # Use a Sequence eg tuple, list, etc
Replacing several files in a directory
To replace several files in a directory, we can use DirURLFix
as follows.
- Instantiate an object of class
DirURLFix
replace_in_dir = DirURLFix("path_to_dir")
- Call
replace_urls
replace_in_dir.replace_urls()
Recursively replacing links in nested directories
To replace outdated links in several files located in several directories, we set recursive
to True
.
Currently, replacing links in directories nested within nested directories is not (yet) supported.
recursive_object = DirURLFix("path_to_root_directory", recursive=True)
We can then proceed as above
recursive_object.replace_urls() # provide other arguments as you may wish.
To report any issues, suggestions or improvement, please do so at issues.
If you would like to cite this work, please use:
Nelson Gonzabato (2021) urlfix: Check and Fix Outdated URLs https://github.com/Nelson-Gon/urlfix
Thank you very much.
“Before software can be reusable it first has to be usable.” – Ralph Johnson