Streaming very large string contents #221
Replies: 5 comments 11 replies
-
I would recommend reading https://www.baeldung.com/jackson-streaming-api (section about JsonParser). There is the streaming parser API of Jackson and it is very low - akin to use SAX or StAX or parsing XML. ReaderBasedJsonParser is not meant to buffer the input. Have you proof? Like, a reproducible test case? You can use StreamReadConstraints to increase the constraints (including the size of docs). |
Beta Was this translation helpful? Give feedback.
-
We are using the Streaming API. The Using |
Beta Was this translation helpful? Give feedback.
-
Example code: public class JacksonStreamTest {
@Test
public void jacksonLimitTest() throws JsonParseException, IOException {
StringBuilder sb = new StringBuilder();
sb.append("{\"name\":\"foo\",\"data\":\"");
for (int i = 0; i < 10_000_000; i++) {
sb.append("1234567890\\n");
}
sb.append("\"}");
StringWriter writer = new StringWriter();
streamData(new StringReader(sb.toString()), writer);
System.out.println(writer.toString());
}
void streamData(Reader reader, Writer writer) throws JsonParseException, IOException {
JsonFactory factory = new JsonFactory();
JsonParser parser = factory.createParser(reader);
// Skip to data element
JsonToken currentToken;
while ((currentToken = parser.nextToken()) != JsonToken.FIELD_NAME || !"data".equals(parser.getText())) {
// Skip
}
parser.nextToken();
parser.getText(writer);
parser.close();
}
} Fails with
|
Beta Was this translation helpful? Give feedback.
-
Ok, here's some more proof: public class JacksonStreamTest {
@Test
public void jacksonLimitTestCyclic() throws JsonParseException, IOException {
Reader reader = new CyclicReader("{\"name\":\"foo\",\"data\":\"", "1234567890\\n", "\"}", 2_000_000_000L);
streamData(reader, NullWriter.INSTANCE);
}
void streamData(Reader reader, Writer writer) throws JsonParseException, IOException {
JsonFactory factory = new JsonFactory();
StreamReadConstraints src = StreamReadConstraints.builder().maxStringLength(2_000_000_000).build();
factory.setStreamReadConstraints(src);
JsonParser parser = factory.createParser(reader);
// Skip to data element
JsonToken currentToken;
while ((currentToken = parser.nextToken()) != JsonToken.FIELD_NAME || !"data".equals(parser.getText())) {
// Skip
}
parser.nextToken();
parser.getText(writer);
parser.close();
}
private static class CyclicReader extends Reader {
private final char[] start;
private final char[] cycle;
private final char[] end;
private final long cycleLimit;
private int phase = 0;
private long current = 0;
public CyclicReader(String start, String cycle, String end, long cycleLimit) {
char[] cycleChars = cycle.toCharArray();
int l = cycleChars.length;
int repeat = Math.max(1, 8192 / l);
this.cycle = new char[repeat * l];
for (int i = 0; i < repeat; i++) {
System.arraycopy(cycleChars, 0, this.cycle, i * l, l);
}
this.cycleLimit = cycleLimit;
this.start = start.toCharArray();
this.end = end.toCharArray();
}
@Override
public int read(char[] cbuf, int off, int len) {
int bytes = 0;
if (phase == 0) {
bytes = readFixed(start, cbuf, off, len);
if (bytes == 0) {
current = 0;
phase++;
} else {
return bytes;
}
}
if (phase == 1) {
bytes = readCycle(cbuf, off, len);
if (bytes == 0) {
current = 0;
phase++;
} else {
return bytes;
}
}
if (phase == 2) {
bytes = readFixed(end, cbuf, off, len);
if (bytes == 0) {
current = 0;
phase++;
}
return bytes;
}
return -1;
}
private int readFixed(char[] source, char[] cbuf, int off, int len) {
int left = source.length - (int) current;
if (left <= 0)
return 0;
if (left > len)
left = len;
System.arraycopy(source, (int) current, cbuf, off, left);
current += left;
return left;
}
public int readCycle(char[] cbuf, int off, int len) {
int copied = 0;
while (copied < len && current < cycleLimit) {
int startIdx = (int) (current % cycle.length);
int step = Math.min(startIdx + (len - copied), cycle.length) - startIdx;
if (current + step > cycleLimit) {
step = (int) (cycleLimit - current);
}
System.arraycopy(cycle, startIdx, cbuf, off + copied, step);
copied += step;
current += step;
}
return copied;
}
@Override
public int read() {
if (phase == 0) {
if (current < start.length)
return start[(int) current++];
phase++;
}
if (phase == 1) {
if (current < cycleLimit)
return cycle[(int) (current++ % cycle.length)];
phase++;
}
if (phase == 2) {
if (current < end.length)
return end[(int) current++];
phase++;
}
return -1;
}
@Override
public void close() {
// No-op
}
}
} Executing this with JVM arguments
|
Beta Was this translation helpful? Give feedback.
-
Ok, so two separate things:
But there is a caveat to (2): There is method:
in So that can be used if incremental access is really needed. It may be worth considering, tho, what design choice led to creation of humongous JSON String values -- if incremental processing is desired, such content should probably be chunked upon creation into "chunks". |
Beta Was this translation helpful? Give feedback.
-
We're communicating with an API that essentially returns the following response (it's a little more complex, but for the purposes of this discussion it will do):
The data in this case can be 10s or even 100s of MBs large. We're trying to output that data to a
Writer
(orOutputStream
). We're trying to do this in a way to avoid loading the entire data block in memory first.We're currently doing:
The
getText(Writer)
currently fails withOf course we could increase the maxStringLength, but that defeats the point. It looks like
ReaderBasedJsonParser
buffers everything in memory first before streaming it. And if the buffer exceeds the maxStringLength it throws this exception.Is there a better way to do it? Or should I file a feature request?
I've experimented a little and if we can force the other side to return the data in Base64 format we could use
jsonParser.readBinaryValue(OutputStream)
instead, which doesn't seem to buffer everything. But this is 33% more data over the network. But this requires a modification on the sender side. It seems to me the streaming version of Jackson was meant to avoid having large blocks of data in memory?We're using Jackson 2.16.1
Edit: fixed typos
Beta Was this translation helpful? Give feedback.
All reactions