-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to spark 3.4 #730
Comments
We're not 3.4 compatible yet 😉 |
@nightscape just found the upgrading guide here: https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-33-to-34 ... seems the abstract class abstract org.apache.spark.sql.catalyst.FileSourceOptions requires options() |
Ok. Can you check if there is a corresponding change e.g. in the CSV source? |
@nightscape seems like this spark PR is the root cause: apache/spark#36069 |
Happy to help with code contributions btw 🚀 |
Ok. We might backport the FileSourceOptions class for the previous versions. |
@josecsotomorales a PR would be awesome!! |
Sure! That would be great! Looking at the code can see some overrides depending on the Spark version. |
Ok, let's consider the options (thinking out loud here):
@josecsotomorales do you see any further options? Would you mind giving 3. a try? From my point of view that would be the preferred option, if the overriding works. It might be that you have to juggle with the order of the directories in Mill a bit. |
Hey everyone, Is there any update on this? We are starting using Spark 3.4 and would be looking forward to this feature :) That's the only blocker for the migration right now. Thanks a lot everyone |
Hi, based on the discussion above I just added a draft PR #754 for this. Compile and test seems fine, the whole file structure needs some cleanup. |
Please try the newly released version 0.19.0 which contains the PR from @christianknoepfle that introduces Spark 3.4 compatibility. |
Is there an existing issue for this?
Current Behavior
java.lang.AbstractMethodError: Receiver class com.crealytics.spark.excel.v2.ExcelPartitionReaderFactory does not define or inherit an implementation of the resolved method 'abstract org.apache.spark.sql.catalyst.FileSourceOptions options()' of abstract class org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.createReader(FilePartitionReaderFactory.scala:35)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Expected Behavior
No response
Steps To Reproduce
Upgrade to spark 3.4 and attempt to load a DF
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: