GitHub - federico-fiorini/storage-systems-comparison: Using Spark to perform comparison between MySQL, MongoDB and Neo4j with a 160M+ record dataset

#STORAGE SYSTEMS COMPARISON

In this project we present a comparison between different storage systems:

a traditional Relational Database - MySQL
a NoSQL Database - MongoDB
a Graph Database - Neo4j

To do so, we use a big dataset containing more than 160M records of commercial flights information in the USA between 1987 and 2015. The raw dataset has been pre-aggregated and formatted accordingly for each of the databases with the use of Spark. In two cases, Spark was also used to import the data to the databases. This evaluation has been achieved by comparing implementation complexity, integration with Spark, and comparing time performances of running several queries on each of the databases.

You can find a complete report and a poster explaining in details implementation and results.

###SOME RESULTS

###HOW TO USE Define the path to your Spark folder of installation as an environment variable named SPARK_HOME.

$ export SPARK_HOME=/path-to-folder/spark-1.5.2

Run the different script to process the import to MySQL, MongoDB and Neo4j.

$ ./process_mysql.sh
$ ./process_mongo.sh
$ ./process_neo4j.sh

If you want to modify something you need to rebuild the jar with sbt (you need to install it first)

sbt clean package

###RELATED WORK

We used the aggregated and cleaned data imported in Neo4j to apply some clustering algorithms on the resulting graph and detect communities in the network. Using the frequency of the routes as our edge weight, we could find communities of airports well connected between each other.

You can find this related project here.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
lib		lib
src/main		src/main
target/scala-2.10		target/scala-2.10
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
merge-csv.sh		merge-csv.sh
poster.pdf		poster.pdf
process_mongo.sh		process_mongo.sh
process_mysql.sh		process_mysql.sh
process_neo4j.sh		process_neo4j.sh
queries.txt		queries.txt
report.pdf		report.pdf
statistics.txt		statistics.txt
unzip.sh		unzip.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

federico-fiorini/storage-systems-comparison

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages