Scraping news from inshorts
websites using Python SeleniumBase.
- Install virtualenv using
pip install --user pipenv
. - Enter virtualenv using
pipenv shell
- Run
pipenv install -r requirements.txt
- Install
pip
dependencies inside the virtualenv :pipenv install dep==
- Seleniumbase for web scraping .
- Visit for more info
-
Collect NEWS from inshorts
-
Collect NEWS Heading, Content, Author of different categories like
World, Sports, Science, Politics etc
-
Open config.yml
- file_name: Define csv filename to strore news data # eg. news_with_category.csv
- url_file_name: Define csv filename to news URL # eg. url_file.csv
- inshort_url: Set inshorts URL # https://inshorts.com/en/read
-
Run
pytest test_news_scrapper.py -s --headless
-
CSV file will have following columns
- [title, content, author, url, category]
- CSV files will be stored inside
dataset
folder