Skip to content
/ Hermes Public

This is the graduate thesis work for my Computer Science Major

Notifications You must be signed in to change notification settings

hmrguez/Hermes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hermes

Introduction

This is the graduate thesis work for the student Hector Miguel Rodriguez Sosa for the Computer Science 2025 class at the University of Havana.

Project Overview

This is a data engineering project focused on a Modular Architecture for a data pipeline. The project is divided into two main parts:

  1. Research and Architecture Documentation: All the research and architecture design documents are located in the docs folder. These documents are written in Typst, and you need to have typst installed on your system to compile them. Use make for compiling all the files, make clean for deleting the .pdf files, and make watch <filepath> to watch a specific file.

  2. Implementation: The implementation of the architecture is located in the src directory. This implementation is a use case of the architecture about a ride data pipeline app, utilizing the following technologies:

    • Java for the core application
    • Scala for Kafka and MongoDB connector and future Spark jobs
    • Flink for stream processing
    • Kafka for event streaming
    • MongoDB for data storage

The current use case is a cab share data pipeline that processes streaming data using Flink, streams events with Kafka, and stores the data in MongoDB.

Future Work

Future enhancements to this project include:

  • Adding a Monitoring Layer
  • Adding a Presentation Layer
  • Adding a Batch Layer using Spark