Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime #2850

Closed
wbo4958 opened this issue Jun 30, 2021 · 4 comments · Fixed by #2828 or #2881
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented Jun 30, 2021

Describe the bug
I have 400+ ORC files located in the dbfs:/XXXX , and I ran the simple query like

spark.conf.set("spark.sql.files.maxPartitionBytes", "1g")
spark.conf.set("spark.sql.shuffle.partitions", "100")
spark.conf.set("spark.rapids.sql.enabled", "true")
spark.conf.set("spark.rapids.sql.exec.CollectLimitExec", "false")

val dfPerfile = spark.read.orc("dbfs:/XXXX")
spark.time(dfPerfile.foreach(_ => ()))

But some tasks are failed with the below exceptions

21/06/30 08:19:52 ERROR Executor: Exception in task 8.1 in stage 0.0 (TID 19)
java.io.InterruptedIOException: getFileStatus on s3a://rapidsai-databricks/XXXXX/partition_id_1=3/partition_id_2=2/part-00005-tid-5832222349885074749-82f13795-1dcc-4695-a487-4521b5830bd5-88-21.c000.snappy.orc: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:365)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:196)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:154)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3000)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2967)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2906)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1018)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$open$2(DatabricksFileSystemV2.scala:505)
	at com.databricks.s3a.S3AExeceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:66)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$open$1(DatabricksFileSystemV2.scala:503)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:434)
	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:240)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:235)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:232)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:455)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:279)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:271)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:455)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:415)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:341)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:455)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.open(DatabricksFileSystemV2.scala:503)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.open(DatabricksFileSystemV2.scala:510)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystem.open(DatabricksFileSystem.scala:88)
	at com.nvidia.shaded.spark.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:546)
	at com.nvidia.shaded.spark.orc.impl.ReaderImpl.(ReaderImpl.java:369)
	at com.nvidia.shaded.spark.orc.OrcFile.createReader(OrcFile.java:343)
	at com.nvidia.spark.rapids.GpuOrcPartitionReaderFactory.buildColumnarReader(GpuOrcScan.scala:148)
	at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:58)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$spark311db$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:96)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:149)
	at org.apache.spark.sql.rapids.shims.spark311db.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:73)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:385)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:222)
	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:177)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:221)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:198)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:238)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1017)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1017)
	at org.apache.spark.SparkContext.$anonfun$runJob$2(SparkContext.scala:2498)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:68)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:148)
	at org.apache.spark.scheduler.Task.run(Task.scala:117)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:732)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1643)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:735)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1321)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1623)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1620)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2988)

After checking the code from https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOrcScan.scala#L169 .

Looks like at the very beginning GpuDataSourceRDD will create GpuOrcPartitionReaders for all PartitionedFile (1 GpuOrcParitionFile for 1 PartitionedFile). Each GpuOrcPartitionReaders will hold an OrcReader. If the OrcReader is not been closed, then it will occupy the connection pool and result in other one that applies connection timeout.

@wbo4958 wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jun 30, 2021
@tgravescs
Copy link
Collaborator

tgravescs commented Jun 30, 2021

This doesn't occur without your pr: #2828 , correct? The perfile reader should do a file at a time, unless you are saying that isn't being closed, but that should be happening by Spark in the PartitionReaderWithBytesRead just like Spark does for any PartitionReader. But maybe we missed something there.
If this is just associated with 2828 then it shouldn't be a bug and should ideally be fixed with that pr.

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Jun 30, 2021

This issue should always happen in UPSTREAM even without #2828. I can reproduce it 100% with more ORC files in the DB environment.

Actually, the Orc reader has been closed in the close function of each GpuOrcPartitionReaders.

What I want to describe is that we have open all OrcReader for all PartitionedFiles at every beginning. Eg, If we have 400 files, then we will create 400 OrcReaders. And the Orc Reader will be released after a file has been fully read. So if there is no bigger connection Pool, then timeout for getting a connection pool should happen

@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Jul 6, 2021
@Salonijain27 Salonijain27 added the P0 Must have for release label Jul 6, 2021
@tgravescs
Copy link
Collaborator

tgravescs commented Jul 6, 2021

that PR specifically says:

This PR didn't fix #2850, I will fix in another PR.

If that is not true we need to update PR and please detail how fixes.

@tgravescs tgravescs reopened this Jul 6, 2021
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Jul 7, 2021

Hi @tgravescs, Sorry for my previous wrong triage.

Today, I debugged this issue and found the GpuOrcPartitionReader is eventually created one by one. The root cause is that the close function of GpuOrcPartitionReader is not called even GpuOrcPartitionReader has finished the reading. Instead, the close function will be called on Spark task completion, see https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/sql-plugin/src/main/scala/com/nvidia/spark/rapids/PartitionReaderIterator.scala#L31, which means every GpuOrcPartitionReader will hold the OrcReader until the task completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
3 participants