Skip to content

despicableGruu/webscrapping-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.6 Web Scraping

Practice 1: Web Scraping the Lima Stock Exchange

Overview

This Python 3.6 code retrieves data from the Lima Stock Exchange website using web scraping techniques. The output is a CSV file containing the extracted information.

This practice is part of the "Data Typology and Lifecycle" course within the Master's in Data Science program at the Universitat Oberta de Catalunya.

The repository includes a license file, this README.md, and the following directories:

/src: Contains the Python code for web scraping.
/doc: Includes the PDF version of this documentation.
/csv: Contains example CSV files generated by the script (daily and historical stock quotes).

Team Members

Patricia Reyes Silva

José Pérez Sánchez

Dataset Characteristics

The dataset captures daily stock and index quotes from the Lima Stock Exchange.

The practice follows this structure:

1. Title: "Traded Values at the Lima Stock Exchange"

2. Subtitle: "Daily Quotes"

3. Context and Justification

Modern technology allows us to access global stock markets simply by visiting their websites, from the comfort of our desks. But what if we need to gather daily quotes from multiple stock exchanges for investment purposes? Manually, we'd have to visit each website daily and collect the data. However, automated data extraction using scripts offers a more efficient solution. In the era of Big Data, where data is generated at increasing speeds, any tool that accelerates data collection and processing is valuable.

This practice aims to develop a Python script to extract daily stock quote data from the Lima Stock Exchange website.

We chose Python for this task due to its suitability for web scraping and its powerful libraries for handling large datasets and developing analytical models.

4. Content

The script captures data for each traded stock on the Lima Stock Exchange, including:

• Stock Details

a.	Name

b.	Ticker Symbol (C1 for common shares, I1 for investment shares, no suffix for stocks also traded on other exchanges)
    
c.	Sector (Various, Agriculture, Industrials, Banks – Financials, etc.)        
            
d.	Segment (Indicates stock liquidity)

• Stock Quote Currency (Sol/Dollar)

• Quotes

a.	Previous Day's Closing Price    

b.	Previous Trading Date (not necessarily the previous day)    

c.	Current Opening Price.    

d.	Day's Last Price.    

e.	Price Change (%) compared to the previous day 

• Offers

a.	Highest Buy Price of the Day    

b.	Lowest Sell Price of the Day

• Trading Activity

a.	Trading Volume

b.	Number of Transactions    

c.	Trading Amount in the quote currency

Acknowledgements

To the Lima Stock Exchange and its IT infrastructure.

Inspiration

The rise of FinTech companies and algorithmic trading, along with the availability of data.

License

The Python code in the scripts is released under the GPL3 license, allowing modification, even for commercial use, distribution, etc., but without any liability for its use or any guarantee.

The data generated by this script is released under the "CC BY-SA 4.0 License.", with the following legal safeguard: The data obtained and stored in the CSV files is not necessarily real-time and may not be completely accurate. In this regard, the authors of the data loading script will not be held liable for any loss that may result from the use of this data.

No responsibility is accepted for any loss or damage resulting from reliance on the information contained in this data, including data or quotes.

Code

The script is detailed in the Python file CotizacionesBVL.py.

The code has two primary functionalities:

(1) Retrieving daily stock quotes for companies listed on the Lima Stock Exchange. Simply run the script without parameters:

python CotizacionesBVL.py 

The script generates a CSV file with the latest company quotes, named in the format: CotizacionesDiarias_YYYYMMDD.csv.

(2) Retrieving quotes for a specific company by providing its ticker symbol and, optionally, a date range. For example:

python CotizacionesBVL.py  --nemonic BVN --endDate 20180101 --startDate 20140501

To get all quotes for BVN, Minera Buenaventura, between 05/01/2014 and 01/01/2018, inclusive if the market was open on those days.

This option generates a file named CotizacionesDiarias_nemonico.csv, where nemónico is the company's ticker symbol.

The code includes a common section that interprets any input arguments.

If no company ticker symbol is provided, the readDailyStockPrizes() function is called to retrieve the latest published quote data. This function, in turn, calls getLastMarketDate() to get the last trading day (since this date is not provided by default in the initial quote page and markets are closed on certain days). Finally, it generates the corresponding CSV file, including the quote date in the filename (CotizacionesDiarias_YYYYMMDD.csv).

If a company ticker symbol is given, the readCompanyStockPrizes(nemonic, startDate, endDate) function is called. This function retrieves additional company data using the auxiliary function getCompanyData(nemonic) (including name, sector, segment, currency from a different URL than the initial one). It then interprets the date range arguments and calls the quote URL. Finally, it combines the company and quote data into the output CSV file (CotizacionesDiarias_nemonico.csv).

Dataset

The CSV files generated contain the same data, both for daily quotes of all values traded on the Lima Stock Exchange and for company quote downloads over a date range. However, fields marked with an asterisk (*) are only present in the first case and are empty in historical company quotes.

• Quote Data

Fecha-Hora: Date and time the file was generated

Fecha Cotización: Quote date of the row.

Imagen: Blue icon for price increase, red for decrease *

Estado: Text indicating upward or downward trend of the day *

• Traded Value

Nombre: (Company, product, fund, index)

Nemónico: Representative symbol of the value

Sector: (Various, Agriculture, Industrials, Banks – Financials, etc.)    

Segmento: (Value classification mainly by liquidity)

• Quotes

Moneda: Sol/Dollar

Precio Anterior: Previous day's closing price   

Fecha Anterior: Previous trading date (not necessarily the previous day)  

Apertura: Current opening price. 

Última: Day's last price. 

Variación: Price change (%) compared to the previous day *

• Offers

Compra: Highest buy price of the day *

Venta: Lowest sell price of the day	*

• Trading Activity

Número Acciones: Trading volume

Núm. Operaciones: Number of transactions for this value during the day

Monto Negocio: Amount traded in the quote currency

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages