[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

wbo4958 · 2021-06-30T08:35:54Z

Describe the bug
I have 400+ ORC files located in the dbfs:/XXXX , and I ran the simple query like

spark.conf.set("spark.sql.files.maxPartitionBytes", "1g")
spark.conf.set("spark.sql.shuffle.partitions", "100")
spark.conf.set("spark.rapids.sql.enabled", "true")
spark.conf.set("spark.rapids.sql.exec.CollectLimitExec", "false")

val dfPerfile = spark.read.orc("dbfs:/XXXX")
spark.time(dfPerfile.foreach(_ => ()))

But some tasks are failed with the below exceptions

21/06/30 08:19:52 ERROR Executor: Exception in task 8.1 in stage 0.0 (TID 19)
java.io.InterruptedIOException: getFileStatus on s3a://rapidsai-databricks/XXXXX/partition_id_1=3/partition_id_2=2/part-00005-tid-5832222349885074749-82f13795-1dcc-4695-a487-4521b5830bd5-88-21.c000.snappy.orc: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:365)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:196)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:154)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3000)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2967)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2906)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1018)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$open$2(DatabricksFileSystemV2.scala:505)
	at com.databricks.s3a.S3AExeceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:66)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$open$1(DatabricksFileSystemV2.scala:503)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:434)
	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:240)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:235)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:232)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:455)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:279)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:271)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:455)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:415)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:341)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:455)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.open(DatabricksFileSystemV2.scala:503)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.open(DatabricksFileSystemV2.scala:510)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystem.open(DatabricksFileSystem.scala:88)
	at com.nvidia.shaded.spark.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:546)
	at com.nvidia.shaded.spark.orc.impl.ReaderImpl.(ReaderImpl.java:369)
	at com.nvidia.shaded.spark.orc.OrcFile.createReader(OrcFile.java:343)
	at com.nvidia.spark.rapids.GpuOrcPartitionReaderFactory.buildColumnarReader(GpuOrcScan.scala:148)
	at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:58)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$spark311db$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:96)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:149)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:73)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:385)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:222)
	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:177)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:221)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:198)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:238)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1017)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1017)
	at org.apache.spark.SparkContext.$anonfun$runJob$2(SparkContext.scala:2498)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:68)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:148)
	at org.apache.spark.scheduler.Task.run(Task.scala:117)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:732)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:735)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1321)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1623)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1620)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2988)

After checking the code from https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOrcScan.scala#L169 .

Looks like at the very beginning GpuDataSourceRDD will create GpuOrcPartitionReaders for all PartitionedFile (1 GpuOrcParitionFile for 1 PartitionedFile). Each GpuOrcPartitionReaders will hold an OrcReader. If the OrcReader is not been closed, then it will occupy the connection pool and result in other one that applies connection timeout.

The text was updated successfully, but these errors were encountered:

tgravescs · 2021-06-30T13:20:15Z

This doesn't occur without your pr: #2828 , correct? The perfile reader should do a file at a time, unless you are saying that isn't being closed, but that should be happening by Spark in the PartitionReaderWithBytesRead just like Spark does for any PartitionReader. But maybe we missed something there.
If this is just associated with 2828 then it shouldn't be a bug and should ideally be fixed with that pr.

wbo4958 · 2021-06-30T21:43:20Z

This issue should always happen in UPSTREAM even without #2828. I can reproduce it 100% with more ORC files in the DB environment.

Actually, the Orc reader has been closed in the close function of each GpuOrcPartitionReaders.

What I want to describe is that we have open all OrcReader for all PartitionedFiles at every beginning. Eg, If we have 400 files, then we will create 400 OrcReaders. And the Orc Reader will be released after a file has been fully read. So if there is no bigger connection Pool, then timeout for getting a connection pool should happen

tgravescs · 2021-07-06T21:31:35Z

that PR specifically says:

This PR didn't fix #2850, I will fix in another PR.

If that is not true we need to update PR and please detail how fixes.

wbo4958 · 2021-07-07T11:29:20Z

Hi @tgravescs, Sorry for my previous wrong triage.

Today, I debugged this issue and found the GpuOrcPartitionReader is eventually created one by one. The root cause is that the close function of GpuOrcPartitionReader is not called even GpuOrcPartitionReader has finished the reading. Instead, the close function will be called on Spark task completion, see https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/PartitionReaderIterator.scala#L31, which means every GpuOrcPartitionReader will hold the OrcReader until the task completion.

wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 30, 2021

wbo4958 mentioned this issue Jun 30, 2021

add cloud reading for orc #2828

Merged

tgravescs assigned wbo4958 Jun 30, 2021

Salonijain27 removed the ? - Needs Triage Need team to review and classify label Jul 6, 2021

Salonijain27 added the P0 Must have for release label Jul 6, 2021

wbo4958 closed this as completed in #2828 Jul 6, 2021

tgravescs reopened this Jul 6, 2021

wbo4958 mentioned this issue Jul 7, 2021

fix orc readers leak issue for ORC PERFILE type #2881

Merged

wbo4958 closed this as completed in #2881 Jul 13, 2021

wbo4958 mentioned this issue Jul 14, 2021

Support orc coalescing reading #2909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

wbo4958 commented Jun 30, 2021 •

edited

Loading

tgravescs commented Jun 30, 2021 •

edited

Loading

wbo4958 commented Jun 30, 2021 •

edited

Loading

tgravescs commented Jul 6, 2021 •

edited

Loading

wbo4958 commented Jul 7, 2021

[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

Comments

wbo4958 commented Jun 30, 2021 • edited Loading

tgravescs commented Jun 30, 2021 • edited Loading

wbo4958 commented Jun 30, 2021 • edited Loading

tgravescs commented Jul 6, 2021 • edited Loading

wbo4958 commented Jul 7, 2021

wbo4958 commented Jun 30, 2021 •

edited

Loading

tgravescs commented Jun 30, 2021 •

edited

Loading

wbo4958 commented Jun 30, 2021 •

edited

Loading

tgravescs commented Jul 6, 2021 •

edited

Loading