supreme-pancake

Repo for Big Data Management project

Three components were created in this project, a producer / data collector (kafka), a distributed database (CassandraDB) and a consumer / data processor (Spark).
The collection of data from a network of sensors was simulated, which then had to be processed and stored in a distributed and efficient way. The data collected (or generated) by kafka were then processed by spark and saved for long-term archiving on cassanda db.
The connection between the PCs has been made simple and scalable using Zerotier.

Leave a star ⭐ if you like this project 🙂 thank you.

What's inside

Kafka module
Cassanda db module
Spark module
Data cleaning scripts
Distributed job start and stop scripts
Project runme script
Project document with details

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
cassandra-db		cassandra-db
kafka-producer		kafka-producer
spark-mapreduce		spark-mapreduce
spark-processor		spark-processor
.gitignore		.gitignore
Documento di progetto - Big data.pdf		Documento di progetto - Big data.pdf
README.md		README.md
datacleaner.py		datacleaner.py
eraseDatasetFromHDFS.sh		eraseDatasetFromHDFS.sh
hdfs.txt		hdfs.txt
loadDatasetToHDFS.sh		loadDatasetToHDFS.sh
pom.xml		pom.xml
run.sh		run.sh
start.sh		start.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

supreme-pancake

What's inside

About

Languages

MRColorR/supreme-pancake

Folders and files

Latest commit

History

Repository files navigation

supreme-pancake

What's inside

About

Topics

Resources

Stars

Watchers

Forks

Languages