-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for timeParserPolicy=LEGACY #2875
Add support for timeParserPolicy=LEGACY #2875
Conversation
docs/compatibility.md
Outdated
@@ -353,7 +353,12 @@ the specified format string will fall into one of three categories: | |||
- Supported on GPU but may produce different results to Spark | |||
- Unsupported on GPU | |||
|
|||
The formats which are supported on GPU and 100% compatible with Spark are : | |||
The formats which are supported on GPU and 100% compatible with Spark vary depending on the setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But from reading below none of the formats are 100% compatible with Spark because even for CORRECTED
and EXCEPTION
we do not detect trailing characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was badly worded. We do claim that some of the formats are 100% compatible. I have made some changes to try and make this clearer.
if (!conf.incompatDateFormats) { | ||
willNotWorkOnGpu(s"LEGACY format '$sparkFormat' on the GPU is not guaranteed " + | ||
s"to produce the same results as Spark on CPU. Set " + | ||
s"spark.rapids.sql.incompatibleDateFormats.enabled=true to force onto GPU.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Instead of hard coding the config here can we use the KEY form the RapidsConf just to avoid any possible misspellings?
During manual performance testing, I ran into some behavior that I don't understand yet, so I am changing this to a draft / WIP for now. |
The issue was that I had ANSI mode enabled when manually testing, and hadn't implemented ANSI support as part of this PR. It is now updated to fall back to CPU if LEGACY + ANSI are both enabled. If we do want to support LEGACY + ANSI together then we can do that as a follow-on issue. |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
93d6b76
to
3b6bc16
Compare
…und during manual fuzzing Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
// We are compatible with Spark for these formats when the timeParserPolicy is LEGACY. It | ||
// is possible that other formats may be supported but these are the only ones that we have | ||
// tests for. | ||
val LEGACY_COMPATIBLE_FORMATS = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are doing lookups in this Seq
can we make it a Set[LegacyParseFormat]
or even better a Map[String, LegacyParseFormat]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Closes #2860
This PR adds support for parsing strings to date/timestamp when
spark.sql.legacy.timeParserPolicy=LEGACY
for the following formats:dd-MM-yyyy
dd/MM/yyyy
yyyy/MM/dd
yyyy-MM-dd
yyyy/MM/dd HH:mm:ss
yyyy-MM-dd HH:mm:ss
We are not 100% compatible with Spark on CPU in all cases so this support is only enabled when
spark.rapids.sql.incompatibleDateFormats.enabled
is also set to true. We have the following limitations when running on the GPU: