Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Load Geojson File using Sedona Context in Databricks #1617

Closed
Kunal-Mishra10 opened this issue Oct 9, 2024 · 5 comments · Fixed by #1662
Closed

Unable to Load Geojson File using Sedona Context in Databricks #1617

Kunal-Mishra10 opened this issue Oct 9, 2024 · 5 comments · Fixed by #1662

Comments

@Kunal-Mishra10
Copy link

Expected behavior

I am trying to execute the following code in Databricks as mentioned in the Sedona Official Doc

df = sedona.read.format("geojson").option("multiLine", "true").load("PATH/TO/MYFILE.json")
.selectExpr("explode(features) as features") # Explode the envelope to get one feature per row.
.select("features.*") # Unpack the features struct.
.withColumn("prop0", f.expr("properties['prop0']")).drop("properties").drop("type")

df.show()
df.printSchema()

Ref : https://sedona.apache.org/latest-snapshot/tutorial/sql/#__tabbed_14_3

I am getting the following error

Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.json.JsonDataSource.readFile(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/spark/sql/execution/datasources/PartitionedFile;Lorg/apache/spark/sql/catalyst/json/JacksonParser;Lorg/apache/spark/sql/types/StructType;)Lscala/collection/Iterator;

Actual behavior

geojson file should be loaded into the dataframe

Steps to reproduce the problem

I have installed the following jar files

I have installed the following libraries

  • apache-sedona==1.6.1
  • geopandas==0.11.1
  • keplergl==0.3.2
  • pydeck==0.8.0

Settings

Sedona version = 1.6.1

Apache Spark version = 3.5.0 (Not working with Spark 3.4 Version as well)

Apache Flink version = NA

API type = Scala, Java, Python? Python

Scala version = 2.11, 2.12, 2.13? 2.12

JRE version = 1.8, 1.11? 1.8

Python version = ?

Environment = Standalone, AWS EC2, EMR, Azure, Databricks?

Copy link

github-actions bot commented Oct 9, 2024

Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.

@james-willis
Copy link
Contributor

Are you using shared access cluster in Databricks?

Copying something Jia said in another thread:

the Shared Access cluster on Databricks does not allow Spark DataSourceV2. This will prevent you from using Sedona GeoJSON reader/writer, GeoParquet reader/writer. Until Databricks fixes this limitation, you won't be able to use these data sources on Databricks Shared access cluster.

@Kunal-Mishra10
Copy link
Author

Hi James,
I am using single user cluster in databricks.

@james-willis
Copy link
Contributor

I can confirm that function exists in open source spark with that signature. I wonder if there is some difference in implementation in the databricks json DataSource. @Kontinuation have you been able to test the geojson reader in databricks?

@Kontinuation
Copy link
Member

I can confirm that function exists in open source spark with that signature. I wonder if there is some difference in implementation in the databricks json DataSource. @Kontinuation have you been able to test the geojson reader in databricks?

I've reproduced this on DBR 15.4 LTS. The JsonDataSource.readFile method on DBR takes one extra argument, that makes it binary incompatible with the open-source Spark. I've submitted a patch to workaround this: #1662

@jiayuasu jiayuasu linked a pull request Oct 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants