A demo on how to calculate Web Session on Yarn that was taken from the hello samza repository
A running grid in a docker container
- Deploy the app. A deployment will create a distribution file with all shell script needed to run them against Yarn.
gradlew clean deploy
-
The deploy directory directory should now contain the application packaged
-
Connect to the container with docker-samza-bash.bat
# from dos
docker-samza-bash.bat
# fro Git Bash
docker-samza-bash.sh
- Create the source topic
kafka-topics.sh --zookeeper localhost:2181 --create --topic pageview-session-input --partitions 2 --replication-factor 1
- Start the Yarn job
./deploy/samza/bin/run-app.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/yarn-session-window-example.properties
2020-04-15 15:06:49.277 [main] ClientHelper [INFO] submitting application request for application_1586961594832_0001
2020-04-15 15:06:49.852 [main] YarnClientImpl [INFO] Submitted application application_1586961594832_0001
2020-04-15 15:06:49.855 [main] JobRunner [INFO] Job submitted. Check status to determine when it is running.
- Check that the app is in
RUNNING
state at http://localhost:8088/cluster/apps - Click on the Application Master link. You should get a similar web page.
- Produce some messages to the "pageview-session-input" topic
kafka-console-producer.sh --topic pageview-session-input --broker-list localhost:9092 < ./data/pageview-session-input.jsonl
- Consume messages from the "pageview-session-output" topic
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic pageview-session-output --property print.key=true --from-beginning