Skip to content

This repository serves as a temporary portfolio showcasing SQL projects and Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.

Notifications You must be signed in to change notification settings

christianebacani/Roadmap

Repository files navigation

Roadmap

Welcome to my Roadmap repository! This repository showcases a comprehensive collection of projects that document my learning journey in data engineering. Each folder represents a specific area of study, featuring a variety of project types, including mini-projects, guided projects, hobby projects, and industry projects. This roadmap serves as both a learning tracker and a portfolio to highlight my growing skills and expertise.

Table of Contents

  1. Overview

  2. Roadmap Files

  3. Contact

  4. Conclusion

Overview

This repository is structured to reflect my learning path in data engineering. Each project demonstrates practical applications of the concepts I have learned, organized into dedicated files for easy navigation. By showcasing these projects, I aim to provide a clear and structured overview of my technical skills and development.

Roadmap Files

Understanding Data Engineering

The Understanding Data Engineering.md file contains key theoretical concepts and definitions from the DataCamp "Understanding Data Engineering" course. It serves as a reference guide for important topics and terminologies in the field of data engineering.

Key Concepts Covered

  • Airflow: Open-source workflow management for scheduling data engineering tasks.
  • AWS (Amazon Web Services): Amazon's cloud computing services.
  • Azure: Microsoft's cloud services.
  • Big Data: Management of large and complex datasets characterized by volume, variety, velocity, veracity, and value.
  • Cloud Computing: Utilizing remote servers hosted on the internet for data management and processing.
  • Database Schema: The logical structure of a database, including its data organization and relationships.
  • Data Engineering: The process of designing, constructing, and managing data systems to facilitate analysis.
  • Data Ingestion: The process of importing data into a system or database.
  • Data Lake: A storage repository that holds large amounts of raw data.
  • Data Pipelines: A set of processes for moving and transforming data.
  • Data Warehousing: Centralized storage of data from multiple sources for analysis.
  • ETL (Extract, Transform, Load): A process that extracts data from one source, transforms it, and loads it into a target system.
  • Google Cloud: Cloud services provided by Google.
  • NoSQL: Non-relational databases for storing structured, semi-structured, and unstructured data.
  • Parallel Processing: The simultaneous use of multiple compute resources to process data.
  • Redshift: Amazon's cloud data warehouse service.
  • S3: Amazon’s cloud object storage service.

Introduction to SQL

The files in this section (Stored Procedure.sql, Student Tables and Views.sql) include projects from my Introduction to SQL coursework, focusing on concepts like:

  • Stored Procedures: Demonstrated in the Stored Procedure.sql file.
  • Creating Views: Showcased in the Student Tables and Views.sql file.

Intermediate SQL

This section contains five mini-projects and one guided project that apply various intermediate SQL concepts, including:

  • Group By, Order By, Aggregation Functions, Joins, and more.

Notable Projects

  • Analyzing Student's Mental Health: This guided project uses various SQL functions (GROUP BY, AVG, COUNT) to analyze student data.
  • Analyze International Debt's Statistics: Focuses on using SQL to summarize and analyze debt statistics using GROUP BY, SUM, and other essential SQL functions.
  • Exploring London’s Travel Network: A guided project that demonstrates the use of aggregation and filtering functions (SUM, GROUP BY, LIMIT).

Joining Data in SQL

Projects in this section demonstrate practical applications of SQL joins, including:

  • Inner Joins, Left Joins, Right Joins, Full Joins, and Cross Joins.

Additional projects cover Set Theory operations (UNION, INTERSECT, EXCEPT) and Subqueries.

Relational Databases in SQL

These projects focus on relational database concepts, including:

  • Data Migration: A project that demonstrates migrating data using INSERT INTO and CREATE TABLE.
  • Attribute Constraints: Managing data integrity through constraints like NOT NULL, UNIQUE, and foreign keys.
  • Many-to-Many Relationships: Demonstrating relational schema designs using surrogate keys and junction tables.
  • Referential Integrity: Managing referential integrity with ON UPDATE and ON DELETE behaviors.

Database Design

This section covers advanced database design principles, including normalization, schema design, and best practices for creating scalable data systems.

8 Week SQL Challenge

This section contains my solutions to the 8 Week SQL Challenge, showcasing real-world SQL problem-solving skills through various case studies.

Data Scraping

This directory contains Python scripts for web scraping various resources such as News, Articles, Wikis, and YouTube Videos, which are then converted into Text File Format or CSV File Format.

Additionally, some scripts not included in this repository but using the same web scraping approaches have been integrated into an AI agent as part of my contributions to an open-source project for a startup company.

Bash Scripts

This directory contains both my mini and major projects in Bash scripting (Bourne Again Shell). I am eager to deepen my knowledge of scripting, orchestration, automation, command-line interfaces (CLI), and Linux. This repository serves as both a guide and a compilation of the practical skills I have acquired in Bash scripting and how I have applied them.

Automates Boring Stuff using Python

This directory contains all the Python scripts I've developed based on lessons from the book Automate the Boring Stuff with Python. I'm passionate about automation and scripting, and these projects reflect both my interests and career aspirations. I believe in building what I learn, and as part of that commitment, I consistently ship my work to GitHub to share and document my progress.

HackerRank

This directory includes various folders such as Python (Basics), serving as a collection of Python exercises I've completed on the HackerRank platform. These exercises complement my learning approach by reinforcing key concepts in Software Engineering and Data Engineering. In addition to building projects, solving diverse Python problems helps strengthen my foundational skills, ensuring a lasting impact on my career development.

Contact

Feel free to reach out to me for any questions or opportunities:

Conclusion

This repository serves as a reflection of my learning journey in data engineering. As I continue to learn and grow, I will update this repository with new projects and insights. Thank you for visiting!

About

This repository serves as a temporary portfolio showcasing SQL projects and Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published