Check how many "Words" there are on a url or list of urls.
By default checks the body gag, but can be also used to check p or article or any other item that may more accurately reflect what the actual content is.
Word separation done using regex search so handles spaces and punctuation.
git clone
cd bulk-word-count-on-url
pip3 install -r requirements.txt
python3 [url|file-of-urls.txt] [-t|--tag body|p|p,h1,li|<other valid list of elements>]
python3 example.txt --tags p
"", 26
"", 26
Because there are 26 words within the paragraphs of the page.
Includes caching, delete the file wc-url-cache.sqlite to clear the cache.
pip3 install --upgrade certifi
pip3 install --upgrade urllib3[secure]
pip3 install --upgrade requests
export PYTHONWARNINGS="ignore:Unverified HTTPS request".
It will double count if you choose tags that may be inside one another. e.g. --tags body,p