-
-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of writing raw UTF-8 encoded byte arrays #1349
base: 2.19
Are you sure you want to change the base?
Improve performance of writing raw UTF-8 encoded byte arrays #1349
Conversation
Thanks @JoostK. Would you be able to add the benchmark to https://github.com/FasterXML/jackson-core/tree/2.19/src/test/java/perf ? |
Sure I can add some; while looking at the existing ones I wonder what the desired testing strategy is:
What is the most valuable thing to have here? 1. is meaningful to compare this particular change across machines/JVMs, but 2. is more valuable to measure and compare write perf of |
Both sound useful - could you add both benchmarks? |
I'll come up with something, probably over the coming days. |
2493a9f
to
60fc4a0
Compare
60fc4a0
to
5117042
Compare
The output escape table covers just 7-bits, meaning that a raw UTF-8 byte cannot be used to index into the table without a branch test for negative bytes (i.e. bytes larger than 0x7F). This extra check occurs in a tight loop and can be avoided if the lookup table were to cover all 8-bit indices. This commit introduces ad-hoc logic in `UTF8JsonGenerator#writeUTF8String` to create an extended copy of `_outputEscapes` if necessary, writing the copy back into the field to avoid having to compute it again (unless it is changed). This ad-hoc strategy was chosen as it is the least disruptive to existing code, as a larger-scale change around `CharacterEscapes` would impact public api or otherwise subtle chances for breakages.
Accidentally rebased onto Here's the results on my MBP w/ M1 Max:
It's not as big of a difference I was seeing in the original benchmarks I had, although it's noticeable how larger strings appear to benefit quite a bit. There is potentially another question of whether |
@JoostK First of all: thank you for contributing this! At high level this makes sense, but I do need to dig bit deeper into this when I have time -- and right now I am bit overloaded/overspread so apologies for delay there may be. Having said that: one thing we will eventually need (if not done; apologies if it has) is to get a CLA, from here: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf (needs to be sent just once for all Jackson contributions) the usual way is to print, fill & sign, scan/photo, email to "cla" at fasterxml dot com. Thank you again; looking forward to getting this merged! |
} | ||
|
||
final int[] extended = new int[0xFF]; | ||
System.arraycopy(escapes, 0, extended, 0, escapes.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is Arrays.copyOf()
that combines these 2 operations
|
||
// When writing raw UTF-8 encoded bytes, it is beneficial if the escaping table can directly be indexed into | ||
// using the byte value. | ||
final int[] extendedOutputEscapes = _extendOutputEscapesTo8Bits(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... this is the (only) part I find problematic. Having to dynamically change _outputEscapes
seems problematic, although I understand why it is being done. Since it is something that may be changed by a call to JsonGenerator.setCharacterEscapes()
modifications cannot be done on constructor.
But: I have an idea for bit bigger changes that would make it possible to eagerly ensure _outputEscapes
is 256 elements long. Will add a separate comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, scratch that. Only now realized this is limited to writeUTF8String
, not all escaping.
So dynamically copying + changing is actually reasonable since it's not always needed etc.
@@ -1914,6 +1916,18 @@ private final void _writeUTF8Segment2(final byte[] utf8, int offset, int len) | |||
_outputTail = outputPtr; | |||
} | |||
|
|||
private int[] _extendOutputEscapesTo8Bits() { | |||
final int[] escapes = _outputEscapes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok: I am fine with the idea, but will propose one change: instead of overwriting _outputEscapes
, let's introduce secondary field (_outputEscapes8Bit
or whatever), left as null
, constructed first time needed.
Just changes it to null check.
The reason is just that ideally _outputEscapes
is left as-is for other use cases, to minimize any risk of regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of overwriting the field is that it will automatically be recreated if it is reset to a 7-bit wide LUT. Storing the 8-bit table in a separate field might get out of date, especially since _outputEscapes
is a protected
field that can be assigned through subclasses (that is evil, of course, but still a possibility).
@JoostK Ok, I now understand the scope and added some suggestions. Since we are not using the new encoding table for all output, I think it's better to avoid overriding But one thing I am wondering is whether this optimization came about actual observations of usage -- that is, if this was merged, would it help with use case you have? This is because as you say, this method seems like less commonly used and if so, benefits might be limited. But if it addresses something that at least you use, it is more reasonable to merge it. |
Thanks for the comments, I haven't gotten to address them yet—nor signing the CLA, that won't be a problem but I don't typically have a printer at hand 😄. My use-case is for UTF-8 encoded bytes that is being read from raw blobs, which are to be sent as a JSON-encoded string (it is known to be UTF-8 encoded) in a web response; this can be in the order of ~100MB of data across ~30k strings, so Since the app will likely be deployed on X86_64, I intent to run the performance test on amd64 to gauge what the impact is there, as I suspect this may depend on the ISA (and possibly how effective a branch predictor is, so may be different across CPU vendors/generations). If the meaningful improvements are only applicable to arm64/M1 then this may not be as beneficial as I observed on macOS. |
No worries @JoostK . FWTW, printing is optional; modifying PDF with info & fake signature works perfectly fine too. And thx for sharing use case: sounds legit. |
Finally found sime time to run this on Intel x64 (specifically Core i7 1270P) but the results are all over the place, so I can't draw conclusive results from it for now. Interestingly the microbenchmark consistently shows that the 7-bit approach performs better on this CPU, which is opposite of what I was measuring on arm64 (M1 Max). On the actual |
Not sure what this means for this PR, really. I'd really like to get stable results to make informed decisions on whether this is worth it. Maybe somebody else is able to run the test suite on amd64 CPUs? |
@JoostK Ok that is... interesting. Given that code seems like it should out-perform existing implementation. I assume you tried with longer test run times but without seeing more stable results? |
The output escape table covers just 7-bits, meaning that a raw UTF-8 byte cannot be used to index into the table without a branch test for negative bytes (i.e. bytes larger than 0x7F). This extra check occurs in a tight loop and can be avoided if the lookup table were to cover all 8-bit indices.
This commit introduces ad-hoc logic in
UTF8JsonGenerator#writeUTF8String
to create an extended copy of_outputEscapes
if necessary, writing the copy back into the field to avoid having to compute it again within the same generator instance (unless it is changed). This ad-hoc strategy was chosen as it is the least disruptive to existing code, as a larger-scale change aroundCharacterEscapes
would impact public api or otherwise subtle chances for breakages.Some quick and dirty JMH tests on M1 Max (arm64 ⚠) with Azul Zulu JDK 21, this shows the following numbers:
This is writing buffers of length
(length)
that contain'a'
in all positions, where(needEscape)
being'first'
has the first byte overwritten with"
, or'last'
its last byte, versus'none'
where the buffer remains a sequence where no escapes need to be inserted.Overall the numbers show improvements in the range of 11%–33%. I wonder if this extends to other CPU architectures, opening this PR to gauge interest in such a change. Note that this only affects
UTF8JsonGenerator#writeUTF8String
which isn't typically used, as it's more common to process fromchar[]
orString
buffers. In my use-case I already have an UTF-8 encodedbyte[]
which prompted me looking into this.This logic can probably be vectorized quite nicely, that is also being done in dotnet's JSON writer infrastructure.