-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't acquire the semaphore for empty input while scanning #4476
Don't acquire the semaphore for empty input while scanning #4476
Conversation
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
hmm looks like the build is failing on not being able to create the docker container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me other than copyright nits.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMultiFileReader.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScanBase.scala
Show resolved
Hide resolved
build |
I ran another iteration of NDS, with similar results. Most queries are slightly faster, with some more significant gains in those single digit queries like q2 q53 q55 where we are gaining up to 3 seconds. Overall I see ~42 seconds over all of NDS at 2TB. |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuMultiFileReader.scala
Outdated
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this looks good to me
Signed-off-by: Alessandro Bellina abellina@nvidia.com
This PR removes instances where we acquired the GpuSemaphore within the scan code, specifically those scans that are
PartitionReader
s. The change is to stop acquiring the semaphore here, because these cases will not be consumed downstream, given how thePartitionReader
is used.PartitionReader
provides anext()
method that returns false when there isn't work left. ThePartitionReaderIterator
wraps these readers, as far as I can tell, and it exposes the iterator interface (hasNext()
returns false if thePartitionReader
returned a false innext()
). This allows us to stop requiring us to acquire the semaphore in thenext()
method, when there isn't any work left in the reader.The PR also adds an acquire in the aggregate where we are producing rows out of nothing. This should be a noop, and likely just a miss that it wasn't there before (protected by upstream nodes doing the acquire for the agg).
The change has modest gains in NDS so far. I want to run a few more times, but so far I see: 10 queries that are 1.20x+ faster, 39 queries that are 1.10x+, and most queries (92) are above 1. I don't see a lot of regression here, except for q94 with 2 seconds, but q94 is falling back to the CPU and can be unpredictable. In other words, it seems to be mostly good, but I want to run a few more tests.
@revans2 wanted comments explaining why we are not acquiring the semaphore. I wasn't sure where to put them, and they are the same each time. I could see us moving them to the
PartitionReaderIterator
or somewhere else more appropriate.