Skip to content

In this project, dbt, Great Expectations, Python and Pandas were used to transform and validate the "Inside Airbnb" dataset. The tools ensure quality data, ready for analysis.

Notifications You must be signed in to change notification settings

JuanCampbsi/analytics_engineering_airbnb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering and Quality Assurance on the Airbnb Dataset in Rio de Janeiro

This project leverages advanced tools such as dbt, Great Expectations, Python, and Pandas to explore, transform, and validate the "Inside Airbnb" dataset. The dataset, extracted from "http://insideairbnb.com/", provides rich information on accommodation listings, guest reviews, and calendar availability in several global cities, including Rio de Janeiro.

The central goal of this project is to transform and model this data for deeper analysis, ensuring its quality and integrity. With the help of dbt, we perform data transformations and modeling, making it ready for analysis and reporting. Pandas and Python are used for preliminary manipulation and cleaning.

To ensure data quality, we have integrated Great Expectations, which allows you to create and run data quality tests. This set of tools ensures that the information is consistent, reliable, and ready for valuable insights into the accommodation market in the featured cities.

Postgres

Installation

%sh
Download https://www.docker.com/get-started/

To start the container, you need to have the file, docker-compose.yml.

%sh
docker compose up

After that, you need to create a .ENV file in the root of the project, and put your keys for the container to work correctly. NOTE: There is an example file called EXAMPLE_ENV, just rename the file to its proper .ENV format.

%sh
DATABASE_USER=YOUR_DATABASE_USER
DATABASE_PASSWORD=YOUR_DATABASE_PASSWORD
POSTGRES_USER=YOUR_POSTGRES_USER
POSTGRES_PASSWORD=YOUR_POSTGRES_PASSWORD
POSTGRES_DB=YOUR_POSTGRES_DB

Creating environment variables - ENV

Using .env files in Python is a common practice to store sensitive information or settings that should not be hard-coded into your source code. In this project we are using the python-dotenv library to load environment variables from a .env file.

API_KEY= YOUR_API_KEY_HERE

Execution with Virtual Environment

Linux and MacOs

Install virtualenv

To install virtualenv, open the terminal and run the following command:

pip install virtualenv

Creating and Activating a Virtual Environment

Open the terminal and navigate to the root directory of the project, there create the environment with the following command:

virtualenv venv

Now activate your virtual environment:

source venv/bin/activate

Installing the necessary tools:

Now you can, still in the root folder, install the necessary tools to run the application using the file requirements.txt:

pip install -r requirements.txt

Deactivating the virtual environment:

To deactivate your virtual environment, simply run the following command:

deactivate

Windows

Install virtualenv

To install virtualenv, open the Command Prompt or PowerShell as administrator and run the following command:

pip install virtualenv

Creating and Activating a Virtual Environment

Open the Command Prompt or PowerShell and navigate to the root directory of the project, there create the environment with the following command:

virtualenv venv

Now activate your virtual environment:

venv/bin/activate

Installing the tools required:

Now you can, still in the root folder, install the tools required to run the application using the requirements.txt file:

pip install -r requirements.txt

Deactivating the virtual environment:

To deactivate your virtual environment, simply run the following command:

deactivate

About

In this project, dbt, Great Expectations, Python and Pandas were used to transform and validate the "Inside Airbnb" dataset. The tools ensure quality data, ready for analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages