-
Notifications
You must be signed in to change notification settings - Fork 1
martinywwan/spark-kafka-streaming
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A basic Apache Spark Streaming application
The motivation behind this project was to provide support to developers and researchers in using Apache Spark Streaming with Apache Kafka.
Prerequisites:
1) Ensure Hadoop is setup i.e. ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set
2)${HADOOP_HOME}/bin/winutils.exe must exist otherwise you will get the error _Failed to locate the winutils binary in the hadoop binary path_
3) Ensure Kafka is setup (https://kafka.apache.org/quickstart)
Instructions to run:
1) Zookeeper is required for Kafka - Run an instance of it: ${kafka_dir}/bin/zookeeper-server-start.sh ${kafka_dir}/config/zookeeper.properties
2) Start Kafka - In this case we will run 1 node: ${kafka_dir}/bin/kafka-server-start.sh ${kafka_dir}/config/server.properties (to run multiple brokers/nodes, run with unique server.properties i.e. unique broker.id and log.dirs)
3) Run the spark-kafka-streaming application either through IDE or execute on a new Shell
4) In a new Shell, open the Kafka console producer: ${kafka_dir}/bin/kafka-console-producer.sh --broker-list [ip/localhost]:[port-default_is_9092] --topic [topic_name]
About
Near real-time streaming using Apache Spark and Apache Kafka
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published