Near real-time streaming using Apache Spark and Apache Kafka

Synopsis

A basic Apache Spark Streaming application

Motivation

The motivation behind this project was to provide support to developers and researchers in using Apache Spark Streaming with Apache Kafka.

Execution

Prerequisites:
1) Ensure Hadoop is setup i.e. ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set
2)${HADOOP_HOME}/bin/winutils.exe must exist otherwise you will get the error _Failed to locate the winutils binary in the hadoop binary path_
3) Ensure Kafka is setup (https://kafka.apache.org/quickstart)

Instructions to run:
1) Zookeeper is required for Kafka - Run an instance of it: ${kafka_dir}/bin/zookeeper-server-start.sh ${kafka_dir}/config/zookeeper.properties
2) Start Kafka - In this case we will run 1 node: ${kafka_dir}/bin/kafka-server-start.sh ${kafka_dir}/config/server.properties (to run multiple brokers/nodes, run with unique server.properties i.e. unique broker.id and log.dirs)
3) Run the spark-kafka-streaming application either through IDE or execute on a new Shell
4) In a new Shell, open the Kafka console producer: ${kafka_dir}/bin/kafka-console-producer.sh --broker-list [ip/localhost]:[port-default_is_9092] --topic [topic_name]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src/main/scala/com/martinywwan		src/main/scala/com/martinywwan
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Near real-time streaming using Apache Spark and Apache Kafka

Synopsis

Motivation

Execution

About

Releases

Packages

Languages

martinywwan/spark-kafka-streaming

Folders and files

Latest commit

History

Repository files navigation

Near real-time streaming using Apache Spark and Apache Kafka

Synopsis

Motivation

Execution

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages