Hermes

Introduction

This is the graduate thesis work for the student Hector Miguel Rodriguez Sosa for the Computer Science 2025 class at the University of Havana.

Project Overview

This is a data engineering project focused on a Modular Architecture for a data pipeline. The project is divided into two main parts:

Research and Architecture Documentation: All the research and architecture design documents are located in the docs folder. These documents are written in Typst, and you need to have typst installed on your system to compile them. Use make for compiling all the files, make clean for deleting the .pdf files, and make watch <filepath> to watch a specific file.
Implementation: The implementation of the architecture is located in the src directory. This implementation is a use case of the architecture about a ride data pipeline app, utilizing the following technologies:
- Java for the core application
- Scala for Kafka and MongoDB connector and future Spark jobs
- Flink for stream processing
- Kafka for event streaming
- MongoDB for data storage

The current use case is a cab share data pipeline that processes streaming data using Flink, streams events with Kafka, and stores the data in MongoDB.

Future Work

Future enhancements to this project include:

Adding a Monitoring Layer
Adding a Presentation Layer
Adding a Batch Layer using Spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

Hermes

Introduction

Project Overview

Future Work

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

Hermes

Introduction

Project Overview

Future Work