apache-hadoop

This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.

numpy scikit-learn pandas matplotlib scapy npcap libcap apache-hadoop

Updated May 8, 2024
Jupyter Notebook

RBC-DSAI-IITM / DCEIL

Star

A fast, scalable and distributed community detection algorithm based on CEIL scoring function.

apache-spark community-detection apache-hadoop

Updated Jan 1, 2019
Scala

Coursal / Hadoop-Examples

Star

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

java hadoop examples mapreduce hadoop-mapreduce mapreduce-java hadoop-example apache-hadoop

Updated May 22, 2024
Java

nghoanglongde / spark-cluster-with-docker

Star

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

docker apache-spark apache-hadoop

Updated May 10, 2024
Shell

bdoepf / aws-emr-prometheus

Star

emr aws apache-spark prometheus apache-flink emr-cluster apache-hadoop

Updated Jan 5, 2021
HCL

whoami-anoint / EasyHadoop

Star

Simplified Hadoop Setup and Configuration Automation

data-science big-data hdfs ec2-instance big-data-analytics apache-hadoop big-data-projects hdfs-cluster big-data-essentials

Updated Sep 2, 2023
Shell

haodemon / HadoopStreaming

Star

Set of Input Formats for Hadoop Streaming

hadoop inputformat apache-hadoop

Updated Sep 25, 2024
Java

chriskery / hadoop-operator

Star

Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.

kubernetes hadoop k8s hadoop-cluster kubernetes-operator apache-hadoop

Updated Jan 19, 2024
Go

felidsche / mail-spam-filter

Star

An email spam filter using Apache Spark’s ML library

apache-spark spark-ml apache-hadoop

Updated Apr 14, 2021
Python

Jordan396 / Giraph-1.2.0-Installation

Star

Instructions for Installing Giraph-1.2.0

virtual-machine google-cloud giraph apache-hadoop ubuntu1804

Updated May 1, 2019

sawadogosalif / Big-Data-Technologies

Star

Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal

apache-spark apache-kafka apache-hive apache-hadoop apache-hbase pysark

Updated Apr 30, 2022
Python

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-hadoop

Here are 82 public repositories matching this topic...

mahmoudparsian / data-algorithms-book

mahmoudparsian / big-data-mapreduce-course

tencentyun / hadoop-cos

s911415 / apache-hadoop-3.1.0-winutils

PBWebMedia / yarn-prometheus-exporter

realtimedatalake / hive-metastore-docker

Guru107 / hadoop-small-files-merger

mohammadtavakoli78 / Cloud-Computing

kowaalczyk / spark-minimal-algorithms

jagdish4501 / Network-intrusion-Detection

RBC-DSAI-IITM / DCEIL

Coursal / Hadoop-Examples

nghoanglongde / spark-cluster-with-docker

bdoepf / aws-emr-prometheus

whoami-anoint / EasyHadoop

haodemon / HadoopStreaming

chriskery / hadoop-operator

felidsche / mail-spam-filter

Jordan396 / Giraph-1.2.0-Installation

sawadogosalif / Big-Data-Technologies

Improve this page

Add this topic to your repo