-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3425] Create not existing Hdfs folder in gluten side when writing hdfs file. #3428
Conversation
Run Gluten Clickhouse CI |
9113611
to
268e26e
Compare
@rui-mo Can you help to review? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test to ensure the functionality?
268e26e
to
89bf2e1
Compare
...c/main/scala/org/apache/spark/sql/execution/datasources/velox/VeloxFormatWriterInjects.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/apache/spark/sql/execution/datasources/velox/VeloxFormatWriterInjects.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just few trivial comment.
...c/main/scala/org/apache/spark/sql/execution/datasources/velox/VeloxFormatWriterInjects.scala
Outdated
Show resolved
Hide resolved
...c/main/scala/org/apache/spark/sql/execution/datasources/velox/VeloxFormatWriterInjects.scala
Outdated
Show resolved
Hide resolved
f15a0ab
to
25a701c
Compare
c24aefc
to
d1bc829
Compare
...c/main/scala/org/apache/spark/sql/execution/datasources/velox/VeloxFormatWriterInjects.scala
Outdated
Show resolved
Hide resolved
Offline discussion with @rui-mo . It is hard to add the unit test to test. |
cbd2765
to
2fc6c37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
@JkSelf
|
Looks we should make parent directory extracted from |
It is my mistake to use the wrong test scripts and caused this error not found previously. Yes. It should be the parent directory. I use the parent directory and got the following exception in velox. It seems we can't create the path in gluten. I may need some time to investigate the root cause. And will revert this PR firstly.
|
…FS file (apache#3428)" This reverts commit a713d5e.
What changes were proposed in this pull request?
When Gluten calls the Velox Parquet writer to write a Parquet file, the temporary path obtained from Spark may not have been created yet. While writing a local file, Velox automatically creates the necessary file path if it does not exist. However, this is not the case for HDFS paths. We attempted to create the HDFS path on the Velox side, but the community was not receptive to this approach. As a result, we decided to create the HDFS path on the Gluten side.
How was this patch tested?
manual local test