Use JSON decoder tokenizer to parse packages from storage indexer #881

jsoriano · 2022-09-21T08:41:00Z

Use JSON decoder tokenizer to parse packages from storage indexer, and add a benchmark for the "Init" process.

The benchmark allows to compare results: slightly more allocations are needed now, but about 30% less memory is used, and it is slightly faster.

This can probably be further improved by directly applying the transformations on the parsed packages. It could be also nice to benchmark just this function and not the whole Init process, though this is the heaviest part of this process.

Before:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         398302468 ns/op        235217858 B/op    648922 allocs/op

After:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         379386685 ns/op        168450578 B/op    650022 allocs/op

elasticmachine · 2022-09-21T08:47:20Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-09-21T09:31:09.988+0000
Duration: 5 min 39 sec

Test stats 🧪

Test	Results
Failed	0
Passed	213
Skipped	0
Total	213

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

mtojek

This can probably be further improved by directly applying the transformations on the parsed packages. It could be also nice to benchmark just this function and not the whole Init process, though this is the heaviest part of this process.

I wouldn't preoptimize it. This is already a decent result. Let's deploy it in the cloud and see how it goes.

storage/index.go

jsoriano · 2022-09-21T09:07:18Z

I wouldn't preoptimize it. This is already a decent result. Let's deploy it in the cloud and see how it goes.

Ok, but I am leaving the comment, I think this is something that we should try, this will also save some memory.

mtojek · 2022-09-21T09:15:58Z

Ok, but I am leaving the comment, I think this is something that we should try, this will also save some memory.

Frankly speaking, I'm wondering if that isn't supported by jsoniter. What you're doing here is iterating over next objects.

jsoriano · 2022-09-21T09:21:59Z

Ok, I have tried to apply the transforms in place and surprisingly (to me 🙂) the results don't vary a lot, so let's keep it like this.

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         394299581 ns/op        166942168 B/op    650017 allocs/op

Frankly speaking, I'm wondering if that isn't supported by jsoniter. What you're doing here is iterating over next objects.

We could also give it a try, I will wait before merging this.

jsoriano · 2022-09-21T09:27:35Z

Ok, did a quick test with jsoniter, decoding the whole document (jsoniter doesn't expose Token()), and it is slightly faster for our use case, but uses more memory and allocations:

goos: linux
goarch: amd64
pkg: github.com/elastic/package-registry/storage
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkInit-8                3         339263116 ns/op        275712688 B/op   1165652 allocs/op

jsoriano added 3 commits September 21, 2022 10:23

Use tokenizer to read packages index

5e1a193

Benchmark init

97fcf38

Add TODO

4de26a4

jsoriano requested a review from a team September 21, 2022 08:41

jsoriano self-assigned this Sep 21, 2022

mtojek approved these changes Sep 21, 2022

View reviewed changes

storage/index.go Show resolved Hide resolved

Comments

889b38c

jsoriano marked this pull request as ready for review September 21, 2022 09:06

Add changelog entry

c89fcf7

mtojek approved these changes Sep 21, 2022

View reviewed changes

jsoriano merged commit e9c611b into elastic:main Sep 21, 2022

jsoriano deleted the json-decoder-tokenizer branch September 21, 2022 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use JSON decoder tokenizer to parse packages from storage indexer #881

Use JSON decoder tokenizer to parse packages from storage indexer #881

jsoriano commented Sep 21, 2022 •

edited

Loading

elasticmachine commented Sep 21, 2022 •

edited

Loading

Build stats

Test stats 🧪

mtojek left a comment

jsoriano commented Sep 21, 2022

mtojek commented Sep 21, 2022

jsoriano commented Sep 21, 2022

jsoriano commented Sep 21, 2022

Use JSON decoder tokenizer to parse packages from storage indexer #881

Use JSON decoder tokenizer to parse packages from storage indexer #881

Conversation

jsoriano commented Sep 21, 2022 • edited Loading

elasticmachine commented Sep 21, 2022 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

mtojek left a comment

Choose a reason for hiding this comment

jsoriano commented Sep 21, 2022

mtojek commented Sep 21, 2022

jsoriano commented Sep 21, 2022

jsoriano commented Sep 21, 2022

jsoriano commented Sep 21, 2022 •

edited

Loading

elasticmachine commented Sep 21, 2022 •

edited

Loading