forked from apache/orc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ORC-614: Implement efficient seek() in decompression streams
The current implementation of ZlibDecompressionStream::seek and BlockDecompressionStream::seek resets the state of the decompressor and the underlying file reader and throws away their buffers. This commit introduces two optimizations which rely on reusing the buffers that still contain useful data, and therefore reducing the time spent reading/uncompressing the buffers again. The first case is when the seeked position is already read and decompressed into the output stream. The second case is when the seeked position is already read from the input stream, but has not been decompressed yet, ie. it's not in the output stream. Tests: - Run the ORC tests, and the Impala tests working on ORC tables. - The regression that apache#476 would cause is not present anymore.
- Loading branch information
Showing
3 changed files
with
147 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters