Add Spark extension example #272

xyang16 · 2022-11-01T17:15:25Z

Spark extension poc:

User can call val outputDf = transformer.transform(df) to run inference.

lanking520 · 2022-11-01T22:34:22Z

apache-spark/spark3.0/image-classification/src/main/scala/com/examples/SparkModel.scala

+  private lazy val criteria = Criteria.builder
+    .setTypes(classOf[Row], classOf[Classifications])
+    .optModelUrls(url)
+    .optTranslator(new SparkImageClassificationTranslator())


is this translator serializable? If not, how do we make sure SparkImageClassificationTranslator being used in each executor?

Yes it should be serializable

lanking520 · 2022-11-01T22:37:42Z

apache-spark/spark3.0/image-classification/src/main/scala/com/examples/SparkModel.scala

+@SerialVersionUID(123456789L)
+class SparkModel(val url : String) extends Serializable {
+
+  private lazy val criteria = Criteria.builder


given we are passing this by SparkPredictor, this doesn't need to be lazy right?

lanking520 · 2022-11-01T22:37:54Z

apache-spark/spark3.0/image-classification/src/main/scala/com/examples/SparkModel.scala

+    .optTranslator(new SparkImageClassificationTranslator())
+    .optProgress(new ProgressBar)
+    .build()
+  private lazy val model = ModelZoo.loadModel(criteria)


same applies here

lanking520 · 2022-11-01T22:38:28Z

apache-spark/spark3.0/image-classification/src/main/scala/com/examples/SparkPredictor.scala

+  final val outputCol = new Param[String](this, "outputCol", "The output column")
+  final val modelUrl = new Param[String](this, "modelUrl", "The model URL")
+
+  def setInputCol(value: String): this.type = set(inputCol, value)


Why use setter? Can we use builder pattern to create it

or Scala can use case class

This is a common pattern for class that extends Transformer, see https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala#L550-L567

lanking520 · 2022-11-01T22:39:31Z

...k/spark3.0/image-classification/src/main/scala/com/examples/ImageClassificationExample.scala

-    println(result.collect().mkString("\n"))
+    println(df.select("image.origin", "image.width", "image.height").show(truncate = false))
+
+    val predictor = new SparkPredictor()


how does customer passing their own translator?

frankfliu · 2022-11-25T18:02:14Z

apache-spark/notebook/Image_Classification_Spark.ipynb

-    "      image = image.flip(2)\n",
-    "      pipeline.transform(new NDList(image))\n",
-    "    }\n",
+    "// Translator: a class used to do preprocessing and post processing\n",


Should we use spark extension in the jupyter notebook?

frankfliu · 2022-11-25T18:03:47Z

apache-spark/spark3.0/image-classification/build.gradle

-
-    runtimeOnly "ai.djl.pytorch:pytorch-model-zoo"
-    runtimeOnly "ai.djl.pytorch:pytorch-native-auto"
+    implementation "org.apache.spark:spark-core_2.12:${spark_version}"


spark extension should have covered spack dependencies

frankfliu · 2022-11-25T18:03:56Z

apache-spark/spark3.0/image-classification/build.gradle

+    implementation "org.apache.spark:spark-core_2.12:${spark_version}"
+    implementation "org.apache.spark:spark-sql_2.12:${spark_version}"
+    implementation "org.apache.spark:spark-mllib_2.12:${spark_version}"
+    implementation "ai.djl:api:${djl_version}"


We should use bom

frankfliu · 2022-11-25T18:04:12Z

apache-spark/spark3.0/image-classification/build.gradle

 }

 compileScala {
    scalaCompileOptions.setAdditionalParameters(["-target:jvm-1.8"])
 }

 application {
+    sourceCompatibility = JavaVersion.VERSION_1_8


Why not use JDK 11?

Because the java version on EMR is 8 now.

frankfliu · 2022-11-25T18:05:20Z

apache-spark/spark3.0/image-classification/build.gradle

@@ -1,32 +1,46 @@
 plugins {
    id 'scala'
    id 'application'
+    id 'com.github.johnrengelman.shadow' version '7.0.0'


Suggested change

id 'com.github.johnrengelman.shadow' version '7.0.0'

id 'com.github.johnrengelman.shadow' version '7.1.2'

frankfliu · 2022-11-25T18:06:52Z

apache-spark/spark3.0/image-classification/build.sbt

@@ -7,6 +7,7 @@ scalacOptions += "-target:jvm-1.8"

 resolvers += Resolver.jcenterRepo

+libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1"
 libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"
 libraryDependencies += "org.apache.spark" %% "spark-mllib" % "3.0.1"
 libraryDependencies += "ai.djl" % "api" % "0.12.0"


Should we upgrade to latest version?

frankfliu · 2022-11-25T18:10:13Z

gradle.properties

@@ -14,3 +14,5 @@ systemProp.org.gradle.internal.publish.checksums.insecure=true
 commons_cli_version=1.5.0
 log4j_slf4j_version=2.18.0
 rapis_version=22.04.0
+spark_version=3.2.2
+djl_version=0.19.0


this should be 0.20.0-SNAPSHOT

xyang16 requested review from zachgk and frankfliu as code owners November 1, 2022 17:15

xyang16 changed the title ~~Spark extension POC~~ [WIP] Spark extension POC Nov 1, 2022

xyang16 force-pushed the spark branch 3 times, most recently from 7eb652f to 3509c8b Compare November 1, 2022 18:08

xyang16 requested a review from lanking520 November 1, 2022 18:15

lanking520 reviewed Nov 1, 2022

View reviewed changes

xyang16 force-pushed the spark branch 8 times, most recently from df82490 to 39ac624 Compare November 3, 2022 01:40

xyang16 force-pushed the spark branch from 39ac624 to 2ae9f97 Compare November 18, 2022 21:05

xyang16 changed the title ~~[WIP] Spark extension POC~~ Add Spark extension example Nov 18, 2022

xyang16 force-pushed the spark branch 6 times, most recently from 0d148fd to 9beeac7 Compare November 24, 2022 02:25

frankfliu reviewed Nov 25, 2022

View reviewed changes

xyang16 force-pushed the spark branch 4 times, most recently from cb1f42a to 3e0fd3f Compare December 21, 2022 22:19

lanking520 approved these changes Dec 22, 2022

View reviewed changes

Add Spark extension example

08cfd74

xyang16 force-pushed the spark branch from 3e0fd3f to 08cfd74 Compare January 6, 2023 19:17

xyang16 merged commit c8df57e into deepjavalibrary:master Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Spark extension example #272

Add Spark extension example #272

xyang16 commented Nov 1, 2022 •

edited

Loading

lanking520 Nov 1, 2022

xyang16 Nov 2, 2022

lanking520 Nov 1, 2022

xyang16 Nov 3, 2022

lanking520 Nov 1, 2022

xyang16 Nov 3, 2022

lanking520 Nov 1, 2022

lanking520 Nov 1, 2022

xyang16 Nov 2, 2022 •

edited

Loading

lanking520 Nov 1, 2022

xyang16 Nov 2, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022 •

edited

Loading

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

frankfliu Nov 25, 2022

xyang16 Dec 21, 2022

	id 'com.github.johnrengelman.shadow' version '7.0.0'
	id 'com.github.johnrengelman.shadow' version '7.1.2'

Add Spark extension example #272

Add Spark extension example #272

Conversation

xyang16 commented Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xyang16 Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xyang16 Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xyang16 commented Nov 1, 2022 •

edited

Loading

xyang16 Nov 2, 2022 •

edited

Loading

xyang16 Dec 21, 2022 •

edited

Loading