hello-samza

Hello Samza is a starter project for Samza jobs.

Please see Hello Samza to get started.

Using Vagrant

If you'd like to use Vagrant to get up and running, follow these instructions.

Install Vagrant http://www.vagrantup.com/
Install Virtual Box https://www.virtualbox.org/

Then once that is done (or if done already) clone this repository and boot the virtual machine up.

cd hello-samza
vagrant up

This will take ~ 10-15 minutes to install Kafka, Hadoop/YARN, Samza, configure everything together and launch the jobs.

Once the VM is launched and you are back at a command prompt go into the virtual machine and see whats running.

vagrant ssh
cd /vagrant

The wikipedia-feed Samza job that is running is consuming a feed of real-time edits from Wikipedia, and producing them to a Kafka topic called "wikipedia-raw". You can view this in real-time by by using the Kafka console consumer to view the topic.

deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-raw

The wikipedia-parser Samza job is then parsing the messages in wikipedia-raw, and extracting information about the size of the edit, who made the change, etc. It outputs these counts to the wikipedia-edits topic.

deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-edits

The wikipedia-stats Samza job reads messages from the wikipedia-edits topic, and calculates counts, every ten seconds, for all edits that were made during that window. It outputs these counts to the wikipedia-stats topic.

deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-stats

You can view the Samza jobs running in the YARN UI http://192.168.80.20:8088/cluster/apps too.

To see how this was setup and works look at vagrant/bootstrap.sh and Hello Samza.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
bin		bin
conf		conf
lib		lib
samza-job-package		samza-job-package
samza-wikipedia		samza-wikipedia
vagrant		vagrant
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hello-samza

Using Vagrant

About

Releases

Packages

Languages

License

romannekhor/hello-samza

Folders and files

Latest commit

History

Repository files navigation

hello-samza

Using Vagrant

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages