Mastodon-Dynamo-App

(For better viewing, you can visit: https://github.com/aitanagoca/Mastodon-Dynamo-App)

Group Information

👥 Group: (P102, grup 05)

Aitana González (U186651)

Jordi Alfonso (U111792)

Arnau Royo (U172499)

(For group mates) - How to execute

⚠️ If you’re having troubles running the application locally, with errors similar to ”cannot find method methodName()”, it might be due to jar conflicts between spark and the dependencies of your application. Find your Spark installation, move to the jar directory (in downloaded spark, the jars directory; in brew spark, the libexec/jars directory, etc.) and remove the following files: gson-2.2.4.jar (or equivalent versions), okhttp-3.12.12.jar (or equivalent versions), okio-1.14.0.jar (or equivalent versions).

(PART 2) Running example application locally

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonStreamingExample target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 3) Stateless: joining a static RDD with a real time stream

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonStateless target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 4) Spark Stateful transformations with windows

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonWindows target/lab3-mastodon-1.0-SNAPSHOT.jar src/main/resources/map.tsv

(PART 5) Spark Stateful transformations with state variables

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonWithState target/lab3-mastodon-1.0-SNAPSHOT.jar en

(PART 6) DynamoDB

(PART 6.1) Writing to Dynamo DB

⚠️ Before following these steps, remember the aws configuration!! (1️⃣ aws configure; 2️⃣ aws configure set aws_session_token < your_aws_session_token >)

⚠️ Before following these steps, remember creating the DynamoBD table manually!! (Table Name: LsdsTwitterHashtags - Primary Key: hashtag

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonHashtags target/lab3-mastodon-1.0-SNAPSHOT.jar en

(PART 6.2) Writing from Dynamo DB

⚠️ Before following these steps, remember the aws configuration!! (1️⃣ aws configure; 2️⃣ aws configure set aws_session_token < your_aws_session_token >)

1️⃣ Mvn: mvn clean

2️⃣ Mvn: mvn validate

3️⃣ Mvn: mvn compile

4️⃣ Mvn: mvn package

5️⃣ Mvn: spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j.properties --class edu.upf.MastodonHashtagsReader target/lab3-mastodon-1.0-SNAPSHOT.jar en

Output

(PART 2) Running example application locally

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated data in real-time. The data includes the tweet’s content, the user who posted it, and any hashtags used.

Captura de pantalla 2024-03-09 a les 17 22 14

(PART 3) Stateless: joining a static RDD with a real time stream

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated languages in real-time. The data includes the language of the tweet and the count of tweets in that language. English appears to have the highest number of tweets in both time intervals displayed.

Captura de pantalla 2024-03-14 a les 14 28 20

Captura de pantalla 2024-03-14 a les 14 28 30

(PART 4) Spark Stateful transformations with windows

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing tweets and their associated languages in real-time. The data includes the language of the tweet and the count of tweets in that language. English appears to have the highest number of tweets in both the micro batch and the 60-second window.

Captura de pantalla 2024-03-14 a les 15 22 48

Captura de pantalla 2024-03-14 a les 15 23 04

Captura de pantalla 2024-03-14 a les 15 23 37

Captura de pantalla 2024-03-14 a les 15 23 50

(PART 5) Spark Stateful transformations with state variables

From the output, we can conclude that the application is functioning correctly. It’s successfully capturing users and their associated number of toots in real-time. The data includes the user’s name and the count of toots they have made. The users are sorted by the number of toots they have made, with the user having the most toots listed first.

Captura de pantalla 2024-03-14 a les 14 33 09

Captura de pantalla 2024-03-14 a les 14 33 19