-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GpuFileFormatWriter to stay in sync with recent Spark changes, but still not support writing Hive bucketed table on GPU. #4484
Conversation
throw an exception when trying to do hive hash partition on GPU Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headline implies that this adds support for writing Hive bucketed tables, but it is not supported even after this PR is merged. We weren't accidentally trying to support this before the PR either, so maybe headline should just state we're updating GpuWriteJobDescription to stay in sync with recent Spark changes.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFileFormatWriter.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFileFormatWriter.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFileFormatWriter.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFileFormatWriter.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: remzi <13716567376yh@gmail.com>
build |
Because Spark330 and Spark301 behave differently on insertInto Signed-off-by: remzi <13716567376yh@gmail.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better, but tests are failing with:
The format of the existing table default.tmp_table_574308_0 is `HiveFileFormat`. It doesn't match the specified format `ParquetDataSourceV2`.
We may need a .format("hive")
when writing the dataframe, but I'm not an expert on Spark's Hive support.
Should we update the title of this PR as we are not adding in support for hived bucketed table writes. |
Signed-off-by: remzi <13716567376yh@gmail.com>
build |
The title has been updated to tell users that we do not support writing hive bucketed table on GPU so far. |
Signed-off-by: remzi 13716567376yh@gmail.com
close #3949
In this PR, we: