-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-7803] Audit Log Manager: Refactoring - use client::produce_record_batch
and reduce retries.
#23775
[CORE-7803] Audit Log Manager: Refactoring - use client::produce_record_batch
and reduce retries.
#23775
Conversation
client::produce_record_batch
and reduce retries.client::produce_record_batch
and reduce retries.
scale test result (passed): It's not immediately clear to me a) how the stress test error bounds were derived or b) whether we might have regressed performance at all. |
22dfb96
to
bb95d56
Compare
Retry command for Build#56495please wait until all jobs are finished before running the slash command
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really good! At least one must change (boolean logic check)
bb95d56
to
0d67052
Compare
force push contents:
|
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56560#01929272-4461-4115-a479-ce0760154bc3:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c71-7831-4cb2-ae4b-04f0913a7970:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c84-b709-443f-b75d-02f64b77d224:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c84-b711-4676-b0a3-9591c7568fa8:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c84-b70c-455a-a8e1-a1789f798894:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c84-b70f-4105-8f55-7b8694146d6a:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c71-7832-4173-b9e2-c9e947c53f77:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c71-7833-4f97-8979-482cc2f693ff:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56770#01929c71-7830-41ca-bb3c-e8d5271b2f03:
|
Retry command for Build#56560please wait until all jobs are finished before running the slash command
|
/ci-repeat 1 |
CI Failures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, ❤️
/cdt |
CDT failure looks like infra https://buildkite.com/redpanda/redpanda/builds/56658#019296c5-6c31-4c3d-9a2b-caec09836335 |
/cdt |
/ci-repeat 1 |
CI Failures:
|
Retry command for Build#56675please wait until all jobs are finished before running the slash command
|
size_t batch_max_bytes, std::optional<ss::logger*> log) | ||
: _batch_max_bytes(batch_max_bytes) | ||
, _log(log.value_or(&kclog)) { | ||
vassert(_log != nullptr, "Injected logger must not be nullptr"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: Do we have a null logger? I.e., explicit nullptr means don't log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clue, but awesome idea
0d67052
to
6d39098
Compare
Useful in audit_log_manager as well as transform logging Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
And adds make_batch_of_one Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
We need these for sorting out which partitions are locally led Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Previous implementation used a very high value for retries on the internal kafka client, which prevents the client from recovering certain types of errors. Instead, we batch up drained records on the manager side, allowing us to hold a copy of each batch in memory and retry failed produce calls from "scratch". This also allows us to be _much_ more aggressive about batching. The internal kafka client will calculate a destination partition for each record, round robin style over the number of partitions. In the new scheme, we shoot for a maximally sized batch first, then select a destination, still round-robin style, but biasing heavily toward locally led partitions. In this way, given the default audit per-shard queue limit and default max batch size (both 1MiB), the most common drain operation should result in exactly one produce request. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
6d39098
to
e5fd326
Compare
force push contents:
|
Retry command for Build#56770please wait until all jobs are finished before running the slash command
|
CI Failures:
|
/ci-repeat 1 |
CI Failure:
|
/backport v24.2.x |
/backport v24.1.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
Failed to create a backport PR to v24.2.x branch. I tried:
|
TODO:
Backports Required
Release Notes
Bug Fixes