Skip to content

Latest commit

 

History

History
89 lines (57 loc) · 3.45 KB

File metadata and controls

89 lines (57 loc) · 3.45 KB

Wiki Commons Bulk Downloader By Category

Wikimedia Commons Bulk Downloader By Category

Why this script

Wikimedia commons have lot of files in various creative commons license. There, all files are categorised. Basically wikimedia commons have various files images, audio, video, ..etc.

We have one audio upload tool called Spell4Wiki that allows to upload the audio files to commons. That tool also categorised the uploaded audio files based on the country code.

For Tamil: Category:Files uploaded by spell4wiki in ta

For English: Category:Files uploaded by spell4wiki in en

More details : Category:Files uploaded by spell4wiki

We can use this uploaded audio files to some other FOSS related projects. So, this script help easy way to download all the files in specific category.

Note: This script not only for audio files we can use this same script for other file format also.

How this script working

This script required category name and max record count.

  1. REQUIRED: category is the wikimedia commons category name that have list of files: "Category:Files uploaded by spell4wiki in ta"
  2. OPTIONAL: max_records is the count of maximum records you want to download.

This script download latest uploaded items to old items. So, max records can help to download the some count of latest items only.

Ref:

Here, Category:Files uploaded by spell4wiki in ta is the category name.

How to Run

  1. Download/Clone this Repo
git clone https://github.com/manimaran96/Wiki-Commons-Bulk-Downloader-By-Category.git 
  1. Open the config.py file in editor and do change the category, max_records and limit

Note: max_records and limit are optional

category = "Category:Files uploaded by spell4wiki in CHECK" max_records = -1 limit = 500

More details to check config.py

  1. Install following libraries
sudo apt update
sudo apt install python3
sudo apt install python3-pip
pip install beautifulsoup4
pip install aiohttp
pip install asyncio
pip install aiofiles
  1. Once all are done now we can run the script.
python3 wikimedia-commons-bulk-downloader-by-category.py

For Contributors

If you willing to contibute this code. Please read below todo list and do your contribution. Before start your contribution make sure to create issue and assign your self. Which is help to reduce rework.

Todo

  1. Some packages install so make requirements.txt based on that.
  2. Fix: While downloding morethan 3000 or large files may failed. Bcz of concurrent download/scraping calls.

Optional

  1. After downloaded files compressed in .zip file format
  2. Make webportal for this.

Contact

  • If you want to get in touch with the developer you can send an email to manimarankumar96@gmail.com or @manimarank in Telegram.
  • Feel free to post suggestions, changes, ideas etc. on GitHub or Telegram!