[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

wbo4958 · 2022-04-12T09:29:32Z

Describe the bug

Even if the input files are not cloud-typed files, or spark.rapids.sql.format.orc/parquet.reader.type is set to COALECING, we still can't get coalescing reading. Instead, multithreaded reading will be used.

The root cause is the default value of queryUsesInputFile which is always true. so the condition canUseCoalesceFilesReader will always be false.

override val canUseCoalesceFilesReader: Boolean =
    rapidsConf.isParquetCoalesceFileReadEnabled && !(queryUsesInputFile || ignoreCorruptFiles)

Steps/Code to reproduce bug

read local files
check the log, the expected log should be Using the coalesce multi-file. But in fact, the log is Using the multi-threaded multi-file

The text was updated successfully, but these errors were encountered:

wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 12, 2022

wbo4958 mentioned this issue Apr 12, 2022

Fix the bug COALESCING reading does not work for v2 parquet/orc datasource #5171

Merged

sameerz removed the ? - Needs Triage Need team to review and classify label Apr 12, 2022

sameerz added this to the Apr 4 - Apr 15 milestone Apr 12, 2022

wbo4958 closed this as completed in #5171 Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

wbo4958 commented Apr 12, 2022

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

Comments

wbo4958 commented Apr 12, 2022