-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850
Comments
This doesn't occur without your pr: #2828 , correct? The perfile reader should do a file at a time, unless you are saying that isn't being closed, but that should be happening by Spark in the PartitionReaderWithBytesRead just like Spark does for any PartitionReader. But maybe we missed something there. |
This issue should always happen in UPSTREAM even without #2828. I can reproduce it 100% with more ORC files in the DB environment. Actually, the Orc reader has been closed in the close function of each GpuOrcPartitionReaders.
|
that PR specifically says:
If that is not true we need to update PR and please detail how fixes. |
Hi @tgravescs, Sorry for my previous wrong triage. Today, I debugged this issue and found the GpuOrcPartitionReader is eventually created one by one. The root cause is that the close function of GpuOrcPartitionReader is not called even GpuOrcPartitionReader has finished the reading. Instead, the close function will be called on Spark task completion, see https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/PartitionReaderIterator.scala#L31, which means every GpuOrcPartitionReader will hold the OrcReader until the task completion. |
Describe the bug
I have 400+ ORC files located in the
dbfs:/XXXX
, and I ran the simple query likeBut some tasks are failed with the below exceptions
After checking the code from https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOrcScan.scala#L169 .
Looks like at the very beginning GpuDataSourceRDD will create GpuOrcPartitionReaders for all PartitionedFile (1 GpuOrcParitionFile for 1 PartitionedFile). Each GpuOrcPartitionReaders will hold an OrcReader.If the OrcReader is not been closed, then it will occupy the connection pool and result in other one that applies connection timeout.The text was updated successfully, but these errors were encountered: