Align output to buffer size in rcfile writer #9001

highker · 2017-09-19T01:40:26Z

Flushing data in arbitrary sizes can be a problem for memory
fragmentation. This is generally fine unless the output stream can
allocate native memory (e.g., gzip streams). Native memory allocator may
not be able to compact fragmentation, which can cause native memory OOM.

Resolves #8993

electrum · 2017-09-19T05:23:04Z

presto-rcfile/src/main/java/com/facebook/presto/rcfile/BufferedOutputStreamSliceOutput.java

+                length -= flushLength;
+            }
+
+            // line up the chuck to buffer size and flush directly to OutputStream


dain · 2017-09-19T18:46:53Z

presto-rcfile/src/main/java/com/facebook/presto/rcfile/BufferedOutputStreamSliceOutput.java

+
+            // line up the chunk to chunk size and flush directly to OutputStream
+            int flushLength = length - length % CHUNK_SIZE;
+            for (int i = 0; i < flushLength / CHUNK_SIZE; i++) {


I think it would be simpler to do:

while (length <= CHUNK_SIZE) { writeToOutputStream(source, sourceIndex, CHUNK_SIZE); sourceIndex += CHUNK_SIZE; length -= CHUNK_SIZE; }

then after the loop you don't need to touch length of bufferOffset.

dain · 2017-09-19T18:47:20Z

presto-rcfile/src/main/java/com/facebook/presto/rcfile/BufferedOutputStreamSliceOutput.java

+            }
+
+            // line up the chunk to chunk size and flush directly to OutputStream
+            int flushLength = length - length % CHUNK_SIZE;


dain · 2017-09-19T18:52:23Z

presto-rcfile/src/test/java/com/facebook/presto/rcfile/TestBufferedOutputStreamSliceOutput.java

+        // test some different buffer sizes
+        for (int bufferSize : new int[] {4096, 4345, 65535, 65536, 65537, 100000}) {
+            // check byte array version
+            ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(length);


Make a subclass that captures the flush sizes and then validate you don't get flushes > 4k

dain

One comment, but otherwise looks good

dain · 2017-09-19T20:34:06Z

presto-rcfile/src/main/java/com/facebook/presto/rcfile/BufferedOutputStreamSliceOutput.java

        }
+
+        // buffer the remaining data


wrapper this with if (length != 0)

dain · 2017-09-19T20:34:32Z

presto-rcfile/src/main/java/com/facebook/presto/rcfile/BufferedOutputStreamSliceOutput.java

        }
+
+        // buffer the remaining data


Flushing data in arbitrary sizes can be a problem for memory fragmentation. This is generally fine unless the output stream can allocate native memory (e.g., gzip streams). Native memory allocator may not be able to compact fragmentation, which can cause native memory OOM.

highker assigned dain Sep 19, 2017

highker requested a review from dain September 19, 2017 01:40

facebook-github-bot added the CLA Signed label Sep 19, 2017

highker force-pushed the rcfile branch 2 times, most recently from 9cf1768 to ab2c3e3 Compare September 19, 2017 04:02

electrum reviewed Sep 19, 2017

View reviewed changes

highker force-pushed the rcfile branch 2 times, most recently from a188bec to 887e3a9 Compare September 19, 2017 18:35

dain suggested changes Sep 19, 2017

View reviewed changes

highker force-pushed the rcfile branch from 887e3a9 to 1efc0c5 Compare September 19, 2017 20:05

dain approved these changes Sep 19, 2017

View reviewed changes

highker force-pushed the rcfile branch from 1efc0c5 to 1e216b2 Compare September 19, 2017 20:43

highker force-pushed the rcfile branch from 1e216b2 to 7e0c47d Compare September 19, 2017 22:00

highker merged commit 7e0c47d into prestodb:master Sep 19, 2017

highker deleted the rcfile branch September 22, 2017 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align output to buffer size in rcfile writer #9001

Align output to buffer size in rcfile writer #9001

highker commented Sep 19, 2017

electrum Sep 19, 2017

dain Sep 19, 2017

dain Sep 19, 2017

dain Sep 19, 2017

dain left a comment •

edited

Loading

dain Sep 19, 2017

dain Sep 19, 2017

Align output to buffer size in rcfile writer #9001

Align output to buffer size in rcfile writer #9001

Conversation

highker commented Sep 19, 2017

electrum Sep 19, 2017

Choose a reason for hiding this comment

dain Sep 19, 2017

Choose a reason for hiding this comment

dain Sep 19, 2017

Choose a reason for hiding this comment

dain Sep 19, 2017

Choose a reason for hiding this comment

dain left a comment • edited Loading

Choose a reason for hiding this comment

dain Sep 19, 2017

Choose a reason for hiding this comment

dain Sep 19, 2017

Choose a reason for hiding this comment

dain left a comment •

edited

Loading