-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readInputStream(in, chunkSize) allocates a full chunkSize for each read operation #3106
Comments
The stack trace in the screenshot seems to be referring to a stream of bytes that's compiled to an |
No. The only time we |
@kalanzai In addition to a heap dump, it would help us if you could also run the program while monitoring with the https://www.azul.com/products/components/azul-mission-control/#downloads, or any other tools that can record memory allocations per object and locate where in the code are those rather huge objects being allocated, whether that be within |
The large fs2/core/shared/src/main/scala/fs2/Collector.scala Lines 101 to 106 in 7777fe3
There are other ways to hit that code path besides As a next step, I'd try to find the byte stream that's being compiled and see why the output is growing so large. |
@diesalbla Thanks for the input. I'll have a look at the tool you linked :) @mpilquist Further investigation might indicate that the error doesn't actually happen during the upload part of our test but rather during re-download of the data. // Method which returns the data from S3 as an java.io.InputStream
def getInputStreamFromS3Client(): IO[InputStream] = ???
def readBytes(): Array[Byte] =
fs2.io.readInputStream(getInputStream(), 10 * 1024).compile.to(Array) |
@kalanzai try And if not, try |
@armanbilge Thanks! Looking at it now we are fairly certain that the issue arises when the the Then when we try to download 10MB of data we end up keeping references to much larger byte vectors (which will be mostly empty). We suspect that it might be that the reason we didn't have the problem in 3.2.7 is that the array copy removed by this PR #2892 only copied the slice of the array which had actual data in it. Due to the copy the large unused space could then be garbage collected. |
Yup! I came to the same conclusion. Edit: actually, not sure it was that PR, I thought it was this one:
Oh, it does, where is this? |
Well, it allocates the size we specify (which is 1MB) |
I was trying to communicate that when we specify 1MB chunksize, but we only read up to 16KB per |
Makes sense! Sorry, I thought 1 mb was hard-coded somewhere, thanks for clarifying :) So is this something we should fix in FS2? There is a trade-off here between avoiding a copy and not keeping references to mostly-unused arrays, and it sounds like in your case the best solution is to tune the chunk size. |
Great find folks! I think we should do something in FS2 but not sure what exactly. |
We will definitely tune our chunk size. |
Two ideas:
fs2/io/jvm-native/src/main/scala/fs2/io/net/SocketPlatform.scala Lines 66 to 67 in 7777fe3
|
Not sure if this would be relevant in this context #2863 |
Perhaps you can use smaller read sizes, and later on if you want to preserve that chunk size for the outputs, you can use the methods As another possibility, you can use the |
@diesalbla We're lowering our read sizes to match what we observed at runtime, namely that it seemed the underlying |
This may be outside the scope of fs2, but would a PR implementing an auto-sizing variation of I'm thinking it would start at some relatively small size (e.g. 1KB) and grow by some factor until the underlying |
@christianharrington Definitely interested in a PR like that, especially if we can use it in |
I have updated the Scastie so it now better represents the issue. The more interesting part for potential tests is the following somewhat hacky implementation of import java.io.InputStream
/** An InputStream which simulates a slow stream by always only reading 1 byte
* at a time
* @param bytes
* The bytes that the simulated stream consists of
*/
class SlowTestInputStream(bytes: Array[Byte]) extends InputStream {
var readBytes: Int = 0
override def read(): Int = {
if (readBytes >= bytes.size) {
-1
} else {
val result = bytes(readBytes)
readBytes += 1
result.toInt
}
}
override def available(): Int = bytes.size - readBytes
override def markSupported(): Boolean = false
override def read(buffer: Array[Byte], off: Int, len: Int): Int = {
if (len == 0) {
0
} else if (readBytes >= bytes.size) {
-1
} else {
buffer(off) = bytes(readBytes)
readBytes += 1
1
}
}
} |
I think #3318 closes this issue, unless I'm mistaken? |
When attempting to upgrade from version
3.2.7
to3.2.8
we encountered anOutOfMemoryError
in one of our tests.Unfortunately we haven't been able to reproduce the error using a minimal example. An attempt can be seen here: https://scastie.scala-lang.org/NflW2ZCIRzSnw7YJ8p5HCQ
Edit: Scastie has been updated to reflect the findings in the comments that the issue likely arises from calling
fs2.io.readInputStream
with achunkSize
that is (much) larger than the number of bytes read from the underlying stream when itsread(buffer, offset, length)
method is invoked.Edit: The below is not really relevant for the issue, but kept for historic purposes
Our code takes an
java.io.InputStream
and uploads it to AWS S3 using the multipart upload functionality from their SDKOur test generates a random kilobyte of data and then uploads varying sizes of repeated kilobytes to S3.
While obviously not exactly the same, the Scastie example reproduces the
fs2
parts of our implementation as faithfully as possible.As mentioned the example doesn't actually exhibit the same behavior as our tests. See below for a screenshot of a heapdump captured by setting
javaOptions += "-XX:+HeapDumpOnOutOfMemoryError"
It looks like the accB field of the OuterRun class for whatever reason builds up a 500MB
scodec.bits.ByteVector.Buffer
It might be of interest that our tests don't fail if the test data is only kilobytes large (10-13 KB).
The text was updated successfully, but these errors were encountered: