Data-Filtering-Pipeline-ETL

Problem Statement

I was given a csv file having 180k rows. First I had to find a data of top 3 countries by Order Country column which I did by python pandas and get EstadiosUnidos,Francia and Mexico as top 3 countries with most records then my mission was to extract data from datalake filtered it by those countries and again send it to datalake in 3 seperate files countrywise. The problem that I faced during the pipeline was that data was raw and it contained special characters that it was not filtering out then I change its encoding from UTF-8 to ISO-8859-1 in AzureDataFactory and problem resolved.

The files

The Second Portfolio Project.json file contains information about the ADF pipeline, including the pipeline name, description, and the resources that make up the pipeline. The manifest.json file contains information about the dependencies and structure of the ARM template of the pipeline in Azure DataFactory. Jupyter notebook file contain the dataanalysis in python.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
Second Portfolio Project.json		Second Portfolio Project.json
Untitled.ipynb		Untitled.ipynb
manifest.json		manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Filtering-Pipeline-ETL

Problem Statement

The files

Workflow

Pipeline Structure

Finally Filtered Data

About

Releases

Packages

Languages

MuhammadHasaanWahid/Data-Filtering-Pipeline-ETL

Folders and files

Latest commit

History

Repository files navigation

Data-Filtering-Pipeline-ETL

Problem Statement

The files

Workflow

Pipeline Structure

Finally Filtered Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages