Retail Project

JARS:

Scroll down to the second answer https://stackoverflow.com/questions/46434255/how-to-query-datasets-in-avro-format

wget http://central.maven.org/maven2/com/databricks/spark-avro_2.11/4.0.0/spark-avro_2.11-4.0.0.jar .
spark-shell --jars /spark-avro_2.11-4.0.0.jar

4)spark.read.format("com.databricks.spark.avro").load("s3://MYAVROLOCATION.avro")

Snowflake query MVP

https://stackoverflow.com/questions/612231/how-can-i-select-rows-with-maxcolumn-value-distinct-by-another-column-in-sql

Here is another example for a case study:

https://github.com/NFLX-WIBD/WIBD-Workshops-2018/tree/master/Data%20Engineering

In this project, we will be migrating the existing Retail project to use the New Architecture using Spark, Airflow and Kafka.

Assignment

Find total Promotion sales generated on weekdays and weekends for each region, year & month
Find the most popular promotion which generated highest sales in each region

Steps Involved

Create pySpark scripts for initial and incremental loads. The script will read sales and promotion tables based on last_update_date column from mysql and store them in AVRO format in S3 buckets. You might want to add a last_update_date in the tables
A second pySpark script will read the AVRO files, filter out all non-promotion records from input, join the promotion and sales tables and save the data in Parquet format in S3 buckets.
The Parquet file is aggregated by regionID, promotionID, sales_year, sales_month to generate total StoreSales for weekdays and weekends and the output is saved as a CSV file in S3 buckets.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
airflow		airflow
old_files		old_files
.gitignore		.gitignore
Aggregate_Sales.csv		Aggregate_Sales.csv
CaseStudy Explanation.docx		CaseStudy Explanation.docx
Casestudy final presentation.pptx		Casestudy final presentation.pptx
README.md		README.md
Retail Case Study.docx		Retail Case Study.docx
RetailCaseStudy.pptx		RetailCaseStudy.pptx
SNOWFLAKE_COMMANDS.txt		SNOWFLAKE_COMMANDS.txt
foodmart.jpg		foodmart.jpg
install-airflow.txt		install-airflow.txt
mysql_insert_dummy_data		mysql_insert_dummy_data
pipeline_dag.py		pipeline_dag.py
schema.png		schema.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail Project

JARS:

Snowflake query MVP

Here is another example for a case study:

Assignment

Steps Involved

About

Releases

Packages

Contributors 4

Languages

trg-retail-2018/retail

Folders and files

Latest commit

History

Repository files navigation

Retail Project

JARS:

Snowflake query MVP

Here is another example for a case study:

Assignment

Steps Involved

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages