-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ANSI intervals to/from Parquet #4810
Support ANSI intervals to/from Parquet #4810
Conversation
Signed-off-by: Chong Gao <res_life@163.com>
build |
sql-plugin/src/main/320+/scala/com/nvidia/spark/rapids/shims/v2/Spark320PlusShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
tests/src/test/330+/scala/com/nvidia/spark/rapids/ParquetWriterIntervalSuite.scala
Outdated
Show resolved
Hide resolved
Is there no special metadata for these types? Are they really just stored as ints and longs and Spark can understand that? |
build |
Spark uses int32 representing YearMonth and int64 representing DayTime.
See Spark TimeAdd(Timestamp, DayTimeIntervalType) code:
Rapids plugin also uses intvlS.getValue.asInstanceOf[Long] to get the interval Scala value:
|
Okay spark is storing it as ints and longs. I still would like to see documentation on it, and where possible use an interval type if available. Just because I see us having to bit cast the column into corresponding interval type more frequently than not. |
|
build |
build |
|
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/Spark33XShims.scala
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/Spark33XShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/Spark33XShims.scala
Show resolved
Hide resolved
build |
build |
1 similar comment
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
tests/src/test/330+/scala/com/nvidia/spark/rapids/TimeAddSuite.scala
Outdated
Show resolved
Hide resolved
tests/src/test/330+/scala/com/nvidia/spark/rapids/TimeAddSuite.scala
Outdated
Show resolved
Hide resolved
tests/src/test/330+/scala/com/nvidia/spark/rapids/ParquetWriterIntervalSuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala
Outdated
Show resolved
Hide resolved
build |
1 similar comment
build |
sql-plugin/src/main/301until330-all/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits I can live without. For others LGTM.
sql-plugin/src/main/301until330-all/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/301until330-all/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/GpuTypeShims.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/330+/scala/com/nvidia/spark/rapids/shims/v2/Spark33XShims.scala
Show resolved
Hide resolved
build |
build |
@revans2 Help review again. |
build |
Closes #4145
SPARK-36825 added the read and write functions for interval data types(YearMonthIntervalType and DayTimeIntervalType).
Support ANSI intervals to/from Parquet.
Pyspark 330 only contains DayTimeIntervalType in 'pyspark.sql.types', not has YearMonthIntervalType, so python tests only tested DayTimeIntervalType.
Filed #4811 for the following features.
Signed-off-by: Chong Gao res_life@163.com