Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for libbeat diskqueue implementation #31594

Merged
merged 4 commits into from
May 17, 2022

Conversation

leehinman
Copy link
Contributor

What does this PR do?

Adds a benchmark test to libbeat disqueue implementation. This benchmark produces, consumes & acks a million events to the queue.

Why is it important?

This benchmark will allow us to see how changes to queue implementation affect performance.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

go test -bench=. -benchtime 10x -timeout 60m

Related issues

@leehinman leehinman added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels May 11, 2022
@leehinman leehinman requested a review from a team as a code owner May 11, 2022 13:33
@leehinman leehinman requested review from belimawr and fearful-symmetry and removed request for a team May 11, 2022 13:33
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 11, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 11, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-12T18:09:39.009+0000

  • Duration: 68 min 29 sec

Test stats 🧪

Test Results
Failed 0
Passed 22558
Skipped 1938
Total 24496

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@leehinman leehinman force-pushed the benchmark_disk_queue branch from 2d49f09 to 7e9643f Compare May 11, 2022 15:05
@cmacknz cmacknz requested a review from faec May 11, 2022 17:16
Content: beat.Event{
Timestamp: eventTime,
Fields: mapstr.M{
"message": msgs[rand.Intn(len(msgs))],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using a random event make the benchmark results less reproducible? Is there a notable improvement vs just using a fixed sequence, or maybe the compromise of using the same seed in the random number generator for every run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A static event vs a random event doesn't make much difference with the results. I get about a 200msec variance on -benchtime 10x runs either way.

I'll add the seed, that way order of benchmarks won't matter.

@cmacknz
Copy link
Member

cmacknz commented May 11, 2022

Can you summarize the results you get from this?

Also, does it run on every commit? It would be nice to get the output as report attached to each CI run, but that is maybe out of scope for this change.

}
}

//setup creates the disk queue, you must remove the directory it makes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't remove it after setup? If not, we might at least want to print the dir path so someone can manually clean it up without having to hunt around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comments

@leehinman leehinman force-pushed the benchmark_disk_queue branch from e28b105 to 985bb52 Compare May 12, 2022 00:15
Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question is similar to @cmacknz, how does this interact with CI? (Does it take long to run? Are the results surfaced anywhere when not running it manually?) If it doesn't have any expected "failure" outcome then should it be disabled on a default test run with a +build directive or something?

Also FYI, depending on which goes in first this may conflict with my pending PR that removes queue.Consumer from the queue API #31502 -- it should be an easy merge though, the only difference is that Get is now a method on the queue itself so there's no need to create a consumer explicitly.

@leehinman
Copy link
Contributor Author

For results we get output like:

go test -bench=Benchmark1M_1k -benchtime 10x -timeout 60m
goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/publisher/queue/diskqueue
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark1M_1k-16    	      10	38365574332 ns/op
PASS
ok  	github.com/elastic/beats/v7/libbeat/publisher/queue/diskqueue	424.300s

which is the average of 10 runs, that produced & consumed 1 million events each. or 26k events/sec.

go benchmark tests do NOT run during CI, and I don't think we want them to. For this to work it would be a fair amount of I/O and I'm pretty confident that CI schedulers would limit the I/O, making any kind of comparison useless.

@leehinman
Copy link
Contributor Author

My only question is similar to @cmacknz, how does this interact with CI? (Does it take long to run? Are the results surfaced anywhere when not running it manually?) If it doesn't have any expected "failure" outcome then should it be disabled on a default test run with a +build directive or something?

gotestsum which runs all the go tests, does not run benchmarks. So you need to run this by hand. But I think that is reasonable. It took ~7 minutes to run the benchmark 10 times on my Mac, and really you need to run 50 or 100 times to get enough data to start doing comparisons. I don't think we can really justify that kind of energy consumption on every commit.

Also FYI, depending on which goes in first this may conflict with my pending PR that removes queue.Consumer from the queue API #31502 -- it should be an easy merge though, the only difference is that Get is now a method on the queue itself so there's no need to create a consumer explicitly.

Thanks, should be an easy fix.

@leehinman leehinman force-pushed the benchmark_disk_queue branch from 1743618 to 4a6cc54 Compare May 12, 2022 18:09
@leehinman leehinman merged commit aafa2d4 into elastic:main May 17, 2022
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Add benchmark for libbeat diskqueue implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants