Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON decoder tokenizer to parse packages from storage indexer #881

Merged
merged 5 commits into from
Sep 21, 2022

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Sep 21, 2022

Use JSON decoder tokenizer to parse packages from storage indexer, and add a benchmark for the "Init" process.

The benchmark allows to compare results: slightly more allocations are needed now, but about 30% less memory is used, and it is slightly faster.

This can probably be further improved by directly applying the transformations on the parsed packages. It could be also nice to benchmark just this function and not the whole Init process, though this is the heaviest part of this process.

Before:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         398302468 ns/op        235217858 B/op    648922 allocs/op

After:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         379386685 ns/op        168450578 B/op    650022 allocs/op

@jsoriano jsoriano requested a review from a team September 21, 2022 08:41
@jsoriano jsoriano self-assigned this Sep 21, 2022
@elasticmachine
Copy link

elasticmachine commented Sep 21, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-21T09:31:09.988+0000

  • Duration: 5 min 39 sec

Test stats 🧪

Test Results
Failed 0
Passed 213
Skipped 0
Total 213

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be further improved by directly applying the transformations on the parsed packages. It could be also nice to benchmark just this function and not the whole Init process, though this is the heaviest part of this process.

I wouldn't preoptimize it. This is already a decent result. Let's deploy it in the cloud and see how it goes.

storage/index.go Show resolved Hide resolved
@jsoriano jsoriano marked this pull request as ready for review September 21, 2022 09:06
@jsoriano
Copy link
Member Author

I wouldn't preoptimize it. This is already a decent result. Let's deploy it in the cloud and see how it goes.

Ok, but I am leaving the comment, I think this is something that we should try, this will also save some memory.

@mtojek
Copy link
Contributor

mtojek commented Sep 21, 2022

Ok, but I am leaving the comment, I think this is something that we should try, this will also save some memory.

Frankly speaking, I'm wondering if that isn't supported by jsoniter. What you're doing here is iterating over next objects.

@jsoriano
Copy link
Member Author

Ok, I have tried to apply the transforms in place and surprisingly (to me 🙂) the results don't vary a lot, so let's keep it like this.

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         394299581 ns/op        166942168 B/op    650017 allocs/op

Frankly speaking, I'm wondering if that isn't supported by jsoniter. What you're doing here is iterating over next objects.

We could also give it a try, I will wait before merging this.

@jsoriano
Copy link
Member Author

Ok, did a quick test with jsoniter, decoding the whole document (jsoniter doesn't expose Token()), and it is slightly faster for our use case, but uses more memory and allocations:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         339263116 ns/op        275712688 B/op   1165652 allocs/op

@jsoriano jsoriano merged commit e9c611b into elastic:main Sep 21, 2022
@jsoriano jsoriano deleted the json-decoder-tokenizer branch September 21, 2022 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants