Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46875][SQL] When the mode is null, a NullPointException should not be thrown #44900

Closed
wants to merge 2 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jan 26, 2024

What changes were proposed in this pull request?

The pr aims to provide better prompts when option's mode is null.

Why are the changes needed?

In the original logic, if the mode is null, Spark will throw a NullPointerException, which is obviously unfriendly to the user.

val cars = spark.read
      .format("csv")
      .options(Map("header" -> "true", "mode" -> null))
      .load(testFile(carsFile))
cars.show(false)

Before:

Cannot invoke "String.toUpperCase(java.util.Locale)" because "mode" is null
java.lang.NullPointerException: Cannot invoke "String.toUpperCase(java.util.Locale)" because "mode" is null
	at org.apache.spark.sql.catalyst.util.ParseMode$.fromString(ParseMode.scala:50)
	at org.apache.spark.sql.catalyst.csv.CSVOptions.$anonfun$parseMode$1(CSVOptions.scala:105)
	at scala.Option.map(Option.scala:242)
	at org.apache.spark.sql.catalyst.csv.CSVOptions.<init>(CSVOptions.scala:105)
	at org.apache.spark.sql.catalyst.csv.CSVOptions.<init>(CSVOptions.scala:49)
	at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:60)

After:
It will fall back to PermissiveMode mode and then display the data normally, as shown below:

18:54:06.727 WARN org.apache.spark.sql.catalyst.util.ParseMode: mode is null and not a valid parse mode. Using PERMISSIVE.

+----+-----+-----+----------------------------------+-----+
|year|make |model|comment                           |blank|
+----+-----+-----+----------------------------------+-----+
|2012|Tesla|S    |No comment                        |NULL |
|1997|Ford |E350 |Go get one now they are going fast|NULL |
|2015|Chevy|Volt |NULL                              |NULL |
+----+-----+-----+----------------------------------+-----+

Does this PR introduce any user-facing change?

Yes, When mode is null, it fallback to PermissiveMode instead of throwing a NullPointerException.

How was this patch tested?

  • Add new UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jan 26, 2024
Comment on lines +55 to +61
case _ =>
logWarning(s"$v is not a valid parse mode. Using ${PermissiveMode.name}.")
PermissiveMode
}
}.getOrElse {
logWarning(s"mode is null and not a valid parse mode. Using ${PermissiveMode.name}.")
PermissiveMode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we show an error on the wrong error instead of silently fallback to the PERMISSIVE mode? cc @HyukjinKwon

I mean both case - null and a wrong mode.

Copy link
Contributor Author

@panbingkun panbingkun Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk
In the current logic, when mode is null, the following error is displayed:
image

Copy link
Contributor Author

@panbingkun panbingkun Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we have two options as follows:
1.If the mode is null, it will fall back to PermissiveMode, just like setting a non-null error value for the mode.
2.If mode is null, we throw a user-friendly error prompt instead of a NullPointerException.

The fix change for this PR adopts the first option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, the fallback to the PERMISSIVE mode looks much more consistent with current approach of not-recognised value. We could stick on it in your PR, but I think it is better to change behaviour and throw appropriate error in both cases.

@panbingkun panbingkun marked this pull request as ready for review January 26, 2024 11:01
@panbingkun panbingkun changed the title [SPARK-46875][SQL] Provide better prompts when option's mode is null [SPARK-46875][SQL] When the mode is null, a NullPointException should not be thrown Jan 26, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Jan 27, 2024

+1, LGTM. Merging to master.
Thank you, @panbingkun.

@MaxGekk MaxGekk closed this in 6d29c72 Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants