Skip to content

saharsamr/BigDataTools

Repository files navigation

BigDataTools

Spark

There are some analysis using Spark to implement different algorithms such as TF-IDF, counting NGrams, logistic regression, graph analysis in order to find some characteristics such as connectivity-degree, diameter of gram and so on, and dimention reduction to implement an MLP network.

Airflow

Used as a tool to manage the flow of data, making scheduling much more easier and flexible.

Kafka

A simple but yet effective strategy to manage large amount of data comming, putting them on different channels and process them to be usable for furtur applications.

Elastic Search

Used as an end-point to Kafka channels and a tool to answer text-based queries. This tool is also used to do pre-process on text data.

Presto

Using Presto to query on Kafka's data, adding codes to make the Presto able to understand its data and to query.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages