-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.2.x] CORE-8394 cluster: consider shard0 reserve in check_cluster_limits #24462
[v24.2.x] CORE-8394 cluster: consider shard0 reserve in check_cluster_limits #24462
Conversation
the below tests from https://buildkite.com/redpanda/redpanda/builds/59322#019398af-4577-4ca0-898b-9406fa159cf7 have failed and will be retried
the below tests from https://buildkite.com/redpanda/redpanda/builds/59322#019398af-457c-4781-aec1-b1e977d9f5df have failed and will be retried
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59322#019398f7-bc6f-4017-9bba-18c8712abed3 |
I've cherry-picked the changes of PR#24409 as well since the changes don't pass CI without it and it makes sense to backport both. |
Improve the user error feedback when the `topic_partitions_reserve_shard0` cluster config is used and a user tried to allocate a topic that is above the partition limits. Previously this check was only considered as part of the `max_final_capacity` hard constraint, which meant that the kafka error message was more vague (No nodes are available to perform allocation after hard constraints were solved) and there were no clear broker logs to indicate this. Now this is also considered inside `check_cluster_limits` which leads to more specific error messages on both the kafka api (unable to create topic with 20 partitions due to hardware constraints) and in broker logs: ``` WARN 2024-11-29 13:18:13,907 [shard 0:main] cluster - partition_allocator.cc:183 - Refusing to create 20 partitions as total partition count 20 would exceed the core-based limit 18 (per-shard limit: 20, shard0 reservation: 2) ``` (cherry picked from commit b632190)
(cherry picked from commit cccb53d)
Pure refactor. Extract for reuse in the next commit. (cherry picked from commit 4b4f6a2)
Internal topics are excluded from checks to prevent allocation failures when creating them. This is to ensure that lazy-allocated internal topics (eg. the transactions topic) can always be created. This excludes them from the global `check_cluster_limits`. There has already been a fixture test to effectively test that internal topics are excluded from the limit checks, however, it erroniously relied on the fact that the shard0 reservations were not considered in `check_cluster_limits` to allow the test to pass. (See `allocation_over_capacity` and the previous commit.) This adds a new test to validate that internal topics can be created even with partitions that are above the global shard0 reservation. (cherry picked from commit 19bc4f2)
725dfe7
to
5fe9620
Compare
Backport of PR #24378 and #24409