Learn and practice web scraping: hands on BeautifulSoup, Selenium and Scrapy.
To see the web version of this project click => HERE
In this project we are going to learn how to use three of the most used libraries for webscraping:
- BeautifulSoup
- Selenium
- Scrapy.
To learn how to use these libraries, first we are going to extract information from the website of the Madrid stock exchange, then we are going to extract economic information from the website of the newspaper El Pais (English version)
Requirements:
To run this notebook it will be necessary to have the following libraries installed:
- beautifulsoup4==4.11.1
- itemadapter==0.6.0
- matplotlib==3.5.1
- numpy==1.22.3
- pandas==1.4.1
- requests==2.27.1
- Scrapy==2.6.1
- selenium==4.2.0 (The browser I use for this library and in this project is firefox)
I leave the extracted articles inside this repository (data.csv), they can be useful to carry out some NLP project. (if you find this project helpful, start it up)