Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

Closed
wbo4958 opened this issue Apr 12, 2022 · 0 comments · Fixed by #5171
Closed

[BUG] Coalescing reading is not working for v2 parquet/orc datasource #5215

wbo4958 opened this issue Apr 12, 2022 · 0 comments · Fixed by #5171
Labels
bug Something isn't working

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented Apr 12, 2022

Describe the bug

Even if the input files are not cloud-typed files, or spark.rapids.sql.format.orc/parquet.reader.type is set to COALECING, we still can't get coalescing reading. Instead, multithreaded reading will be used.

The root cause is the default value of queryUsesInputFile which is always true. so the condition canUseCoalesceFilesReader will always be false.

override val canUseCoalesceFilesReader: Boolean =
    rapidsConf.isParquetCoalesceFileReadEnabled && !(queryUsesInputFile || ignoreCorruptFiles)

Steps/Code to reproduce bug

  1. read local files
  2. check the log, the expected log should be Using the coalesce multi-file. But in fact, the log is Using the multi-threaded multi-file
@wbo4958 wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 12, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 12, 2022
@sameerz sameerz added this to the Apr 4 - Apr 15 milestone Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants