Skip to content

Latest commit

 

History

History
130 lines (83 loc) · 3.67 KB

README.md

File metadata and controls

130 lines (83 loc) · 3.67 KB

Data Engineering and Quality Assurance on the Airbnb Dataset in Rio de Janeiro

This project leverages advanced tools such as dbt, Great Expectations, Python, and Pandas to explore, transform, and validate the "Inside Airbnb" dataset. The dataset, extracted from "http://insideairbnb.com/", provides rich information on accommodation listings, guest reviews, and calendar availability in several global cities, including Rio de Janeiro.

The central goal of this project is to transform and model this data for deeper analysis, ensuring its quality and integrity. With the help of dbt, we perform data transformations and modeling, making it ready for analysis and reporting. Pandas and Python are used for preliminary manipulation and cleaning.

To ensure data quality, we have integrated Great Expectations, which allows you to create and run data quality tests. This set of tools ensures that the information is consistent, reliable, and ready for valuable insights into the accommodation market in the featured cities.

Postgres

Installation

%sh
Download https://www.docker.com/get-started/

To start the container, you need to have the file, docker-compose.yml.

%sh
docker compose up

After that, you need to create a .ENV file in the root of the project, and put your keys for the container to work correctly. NOTE: There is an example file called EXAMPLE_ENV, just rename the file to its proper .ENV format.

%sh
DATABASE_USER=YOUR_DATABASE_USER
DATABASE_PASSWORD=YOUR_DATABASE_PASSWORD
POSTGRES_USER=YOUR_POSTGRES_USER
POSTGRES_PASSWORD=YOUR_POSTGRES_PASSWORD
POSTGRES_DB=YOUR_POSTGRES_DB

Creating environment variables - ENV

Using .env files in Python is a common practice to store sensitive information or settings that should not be hard-coded into your source code. In this project we are using the python-dotenv library to load environment variables from a .env file.

API_KEY= YOUR_API_KEY_HERE

Execution with Virtual Environment

Linux and MacOs

Install virtualenv

To install virtualenv, open the terminal and run the following command:

pip install virtualenv

Creating and Activating a Virtual Environment

Open the terminal and navigate to the root directory of the project, there create the environment with the following command:

virtualenv venv

Now activate your virtual environment:

source venv/bin/activate

Installing the necessary tools:

Now you can, still in the root folder, install the necessary tools to run the application using the file requirements.txt:

pip install -r requirements.txt

Deactivating the virtual environment:

To deactivate your virtual environment, simply run the following command:

deactivate

Windows

Install virtualenv

To install virtualenv, open the Command Prompt or PowerShell as administrator and run the following command:

pip install virtualenv

Creating and Activating a Virtual Environment

Open the Command Prompt or PowerShell and navigate to the root directory of the project, there create the environment with the following command:

virtualenv venv

Now activate your virtual environment:

venv/bin/activate

Installing the tools required:

Now you can, still in the root folder, install the tools required to run the application using the requirements.txt file:

pip install -r requirements.txt

Deactivating the virtual environment:

To deactivate your virtual environment, simply run the following command:

deactivate