-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Heartbeat] Fix excessive memory usage when parsing bodies #15639
Merged
andrewvc
merged 11 commits into
elastic:master
from
andrewvc:fix-excessive-memory-usage
Jan 22, 2020
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f5b2816
[Heartbeat] Fix excessive memory usage when parsing bodies
andrewvc 50e4218
More precise byte accounting
andrewvc 7f044df
Stop checking for UTF-8 in body
andrewvc cae4335
Switch string builder
andrewvc 03c5110
gofmt
andrewvc 16af47d
Use io.Copy
andrewvc 2ee596b
Properly handle EOF
andrewvc eb65da3
Add changelog entry
andrewvc a820f1d
Fix changelog
andrewvc 2de86c8
Merge remote-tracking branch 'origin/master' into fix-excessive-memor…
andrewvc 0b6a5af
Fix changelog merge
andrewvc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the whole loop can be replaced using the
io
package only. e.g.This reads up to
maxPrefixSize
bytes from the body and writes every single byte tohash
andprefixBuf
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooooh, that's nice, will give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually wondering now, the only reason we do the prefix reading in the first place is to give something for the response checkers to read. The problem before was that if you used a JSON checker and a grep checker, they'd share the same IO, so one would use up the stream before the other could get to work.
Maybe we should just use the tee reader in the first place to split the stream between all the users, then we can always read the full body and not worry about reading some fixed quantity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think that's a bit out of scope here. I'd like to keep the code as-is.
I tried to get things cleaner with the TeeReader, but that really requires us to have more of a
limitWriter
than alimitReader
. The code sample you posted unfortunately only hashes the prefix, whereas the current code hashes the full body.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the hash it should be enough to add another
io.Copy(body, hash)
if the firstio.Copy
did not reach EOF. It's still the same reader, and readers are stateful. The second copy should continue appending to the hash from the last known position. Like:Alternatively we could introduce a LimitedWriter and a TeeWriter (as you've already noticed). They don't exist in our code base, so we would need to implement these ourselves :(. Then the operation would become:
The idea is similar to my first sample code. Let's try to reduce code handling buffers, indices, offsets and size limits in loops by replacing them using more declarative (reusable?) reader/writer constructors.
Unfortunately parsers like
JSON
orregexp
support are pull based. Meaning they require an io.Reader in order to "pull" new bytes. Constructing pipelines using a mix of pull and push (reader and writer) can still be somewhat tricky.At least for JSON I have an alternative. In Beats we ues go-structform for spooling to disk and in our outputs. Interestingly the JSON parser has a push based interface: https://github.com/elastic/go-structform/blob/master/json/parse.go#L154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the code to use
io.Copy
, much cleaner, thanks!Thinking about it, I think I'm fine with the buffer approach for now. Chances are that users dealing with large files don't want to use either JSON or regexp matching. I think I'm fine waiting for that feature request to come in.