Skip to content

Near real-time streaming using Apache Spark and Apache Kafka

Notifications You must be signed in to change notification settings

martinywwan/spark-kafka-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Near real-time streaming using Apache Spark and Apache Kafka


Synopsis


A basic Apache Spark Streaming application

Motivation


The motivation behind this project was to provide support to developers and researchers in using Apache Spark Streaming with Apache Kafka.

Execution


Prerequisites:
1) Ensure Hadoop is setup i.e. ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set
2)${HADOOP_HOME}/bin/winutils.exe must exist otherwise you will get the error _Failed to locate the winutils binary in the hadoop binary path_
3) Ensure Kafka is setup (https://kafka.apache.org/quickstart)

Instructions to run:
1) Zookeeper is required for Kafka - Run an instance of it: ${kafka_dir}/bin/zookeeper-server-start.sh ${kafka_dir}/config/zookeeper.properties
2) Start Kafka - In this case we will run 1 node: ${kafka_dir}/bin/kafka-server-start.sh ${kafka_dir}/config/server.properties (to run multiple brokers/nodes, run with unique server.properties i.e. unique broker.id and log.dirs)
3) Run the spark-kafka-streaming application either through IDE or execute on a new Shell
4) In a new Shell, open the Kafka console producer: ${kafka_dir}/bin/kafka-console-producer.sh --broker-list [ip/localhost]:[port-default_is_9092] --topic [topic_name]

Releases

No releases published

Packages

No packages published

Languages