QQs: periodically apply policies if there's a discrepancy between the current and desired policy-driven state #12640

michaelklishin · 2024-11-04T05:38:59Z

This is #12412 by @LoisSotoLopez rebased on top of main.

Instead of checking the values for current configuration, represented in `rabbit_quorum_queue:handle_tick` by the `Overview` variable, against the effective policy, just regenerate the configuration and compare with the current configuration.

(some of this is just reverting to the original format to reduce the diff against main)

Removes the usage of a ShouldLog parameter on several functions and limits the logging of the message warning about the delivery_limit not being set to the moment of queueDeclaration

As described in #12413 (comment) test case queue_topology flaked in CI with the following error: ``` rabbitmq_amqp_client > management_SUITE > cluster_size_3 > queue_topology #1. {error,{test_case_failed,{824, <<"rmq-ct-cluster_size_3-1-21000@localhost">>}}} ``` This flake could not be reproduced locally (neither with Mnesia nor with Khepri).

in order to troubleshoot the flake described in #12413 (comment) ``` Node: rabbit_shard2@localhost Case: amqp_system_SUITE:access_failure Reason: {error,{{badmatch,{error,134, "Unhandled exception. System.Exception: expected exception not received\n at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477\n at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}}, [{amqp_system_SUITE,run_dotnet_test,2, [{file,"amqp_system_SUITE.erl"}, {line,257}]}, ```

Support x-cc message annotation Support an `x-cc` message annotation in AMQP 1.0 similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1. The value of the `x-cc` message annotation must by a list of strings. A message annotation is used since application properties allow only simple types.

Prior to this commit tests * leader_transfer_quorum_queue_credit_single * leader_transfer_quorum_queue_credit_batches flaked in CI during 4.1 (main) and 4.0 mixed version testing. The follwing error occurred on node 0: ``` [error] <0.1950.0> Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0 [warning] <0.1950.0> Closing session for connection <0.1945.0>: {'v1_0.error', [warning] <0.1950.0> {symbol,<<"amqp:internal-error">>}, [warning] <0.1950.0> {utf8, [warning] <0.1950.0> <<"Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0">>}, [warning] <0.1950.0> undefined} ``` Therefore we enable this feature flag for both tests. This commit also simplifies some test setups that were necessary for 4.0/3.13 mixed version testing, but isn't necessary anymore for 4.1/4.0 mixed version testing.

This test flakes in CI as described in #12413 (comment) The test case fails with ``` Node: rabbit_shard2@localhost Case: amqp_system_SUITE:access_failure Reason: {error,{{badmatch,{error,134, "Unhandled exception. System.Exception: expected exception not received at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477 at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}}, [{amqp_system_SUITE,run_dotnet_test,2, [{file,"amqp_system_SUITE.erl"}, {line,257}]}, ``` However, RabbitMQ closes the session as expected due to the missing read permissions to the queue as shown in the RabbitMQ logs: ``` [debug] <0.1321.0> Asked to create a new user 'access_failure', password length in bytes: 24 [info] <0.1321.0> Created user 'access_failure' [debug] <0.1324.0> Asked to set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*' [info] <0.1324.0> Successfully set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*' [info] <0.1333.0> accepting AMQP connection 127.0.0.1:36248 -> 127.0.0.1:25000 [debug] <0.1333.0> User 'access_failure' authenticated successfully by backend rabbit_auth_backend_internal [info] <0.1333.0> Connection from AMQP 1.0 container 'AMQPNetLite-101d7d51': user 'access_failure' authenticated using SASL mechanism PLAIN and granted access to vhost '/' [debug] <0.1333.0> AMQP 1.0 connection.open frame: hostname = 127.0.0.1, extracted vhost = /, idle-time-out = undefined [debug] <0.1333.0> AMQP 1.0 created session process <0.1338.0> for channel number 0 [warning] <0.1338.0> Closing session for connection <0.1333.0>: {'v1_0.error', [warning] <0.1338.0> {symbol, [warning] <0.1338.0> <<"amqp:unauthorized-access">>}, [warning] <0.1338.0> {utf8, [warning] <0.1338.0> <<"read access to queue 'test' in vhost '/' refused for user 'access_failure'">>}, [warning] <0.1338.0> undefined} [debug] <0.1333.0> AMQP 1.0 closed session process <0.1338.0> with channel number 0 [warning] <0.1333.0> closing AMQP connection <0.1333.0> (127.0.0.1:36248 -> 127.0.0.1:25000, duration: '269ms'): [warning] <0.1333.0> client unexpectedly closed TCP connection ``` ``` let receiver = ReceiverLink(ac.Session, "test-receiver", src) ``` uses a null constructur for the onAttached callback. ReceiverLink doesn't seem to block. Given that the exact same authorization error is already tested in test case attach_source_queue of amqp_auth_SUITE, it's safe to delete this F# test.

Bumps [org.springframework.boot:spring-boot-starter-parent](https://github.com/spring-projects/spring-boot) from 3.3.4 to 3.3.5. - [Release notes](https://github.com/spring-projects/spring-boot/releases) - [Commits](spring-projects/spring-boot@v3.3.4...v3.3.5) --- updated-dependencies: - dependency-name: org.springframework.boot:spring-boot-starter-parent dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

This is enabled on main and for pull requests. Bazel remains used in previous branches.

... not 60 milliseconds.

... not in parallel.

…ion tests

`init_per_group/3`, which starts the broker, was already called earlier in the function. This fixes a bug where the node can't be stopped in `end_per_group/2`, attecting the next group ability to start one.

For example, if the first restarted node doesn't start, don't try to restart the other nodes. This mimics what orchestrators such as Kubernetes or BOSH would do (although they perform this check differently)

Closes #9259. ## What? Allow an AMQP 1.0 client to renew an OAuth 2.0 token before it expires. ## Why? This allows clients to keep the AMQP connection open instead of having to create a new connection whenever the token expires. ## How? As explained in #9259 (comment) the client can `PUT` a new token on HTTP API v2 path `/auth/tokens`. RabbitMQ will then: 1. Store the new token on the given connection. 2. Recheck access to the connection's vhost. 3. Clear all permission caches in the AMQP sessions. 4. Recheck write permissions to exchanges for links publishing to RabbitMQ, and recheck read permissions from queues for links consuming from RabbitMQ. The latter complies with the user expectation in #11364.

Using a log macro has the benefit that location data is added as explained in https://www.erlang.org/doc/apps/kernel/logger.html#t:metadata/0

By running it * On push, when relevant code paths change * Every Monday morning The peer discovery subsystem does not change particularly often, and this plugin in particular does not. Nonetheless, we currently run it for every push unconditionally.

It is possible for a slow running follower with local consumers to crash after a snapshot installation as it tries to read an entry from its log that is no longer there (as it has been consumed and completed by another node but still refers to prior consumers on the current node). This commit makes the log effect callback function more defensive to check that the number of commands returned by the log effect isn't different from what was requested. if it is different we consider this a stale read request and return no further effects. Conflicts: deps/rabbit/test/quorum_queue_SUITE.erl

in rabbitmq/server-packages, an Actions-only repo dedicated to open source RabbitMQ release automation.

…dentifier" This reverts commit de0d8cf.

Backport of #12640 to v4.0.x

LoisSotoLopez and others added 30 commits October 24, 2024 07:23

Add QQ periodic policy repair

f9179d1

Add test for QQ policy repair feature

b408351

Use ra_machine_config but limit keys to check

ec87ef1

Refactoring suggestion

ccd8548

(some of this is just reverting to the original format to reduce the diff against main)

Move tests to main qq SUITE & refactor a bit

dc9ab1d

Consider QQs may let pass 1st overflowing msg

51abb5c

Use local function for ensuring qq proc dead

df14b4a

Use wait_for_messages_ready

42b58c7

Simplify publish_confirm_many

3b5069f

Remove ShouldLog & limit deliv. limit not set logg

9dc9f97

Removes the usage of a ShouldLog parameter on several functions and limits the logging of the message warning about the delivery_limit not being set to the moment of queueDeclaration

Remove extra keys from gather_policy_config out

2577b7e

Make CI: Add mixed version testing

2235492

This is enabled on main and for pull requests. Bazel remains used in previous branches.

Make CI: Enable khepri mixed clusters testing

3dbfcaa

Fix metrics_SUITE connection_metrics flake

ef06f80

Use fmt_string in this error message

0a557f7

4.0.3 release notes

88df855

Fix a typo in 4.0.3 release notes

0a59746

rabbitmq-run.mk: Use a 60 seconds timeout for rabbitmqctl wait

7f1d161

... not 60 milliseconds.

rabbitmq-run.mk: Restart nodes in a cluster sequentially

2d61fac

... not in parallel.

Add AMQP 1.0 event exchange test

ea7bc81

queue_SUITE: use a different upstream for each queue on multi-federat…

624b72b

…ion tests

rabbit_prometheus_http_SUITE: Start broker once in special_chars group

b5b598c

`init_per_group/3`, which starts the broker, was already called earlier in the function. This fixes a bug where the node can't be stopped in `end_per_group/2`, attecting the next group ability to start one.

MarcialRosales and others added 20 commits November 4, 2024 00:34

Use the correct variable name

c0ef442

Verify non-zero DNS and email SAN

c8e1593

Abort restart-cluster if something goes wrong

df8f6d1

For example, if the first restarted node doesn't start, don't try to restart the other nodes. This mimics what orchestrators such as Kubernetes or BOSH would do (although they perform this check differently)

Tests: wait for connection closed in metrics_SUITE

ab9d225

Test: wait for metrics

7ac5b17

Test: metrics_SUITE queue_idemp wait for queue metrics

ff44f4d

Actions deps: manually apply #12630 #12631

a6adf74

Use log macros for AMQP

af876ed

Using a log macro has the benefit that location data is added as explained in https://www.erlang.org/doc/apps/kernel/logger.html#t:metadata/0

bazel run gazelle

654bd04

Update SECURITY.md

84e65cc

New workflow for triggering alpha releases

da615ad

in rabbitmq/server-packages, an Actions-only repo dedicated to open source RabbitMQ release automation.

Use a known repository_dispatch event type

7ddd9d8

Actions: try a using short commit SHA for alpha identifier

3cf326e

Actions: trigger alpha build workflow run when workflow itself changes

aaebcc1

Actions, alpha build: try passing in a different prerelease_identifier

a1a555e

Revert "Actions, alpha build: try passing in a different prerelease_i…

49ad8ea

…dentifier" This reverts commit de0d8cf.

Actions/alpha build: cosmetics

d6a9db0

mergify bot added bazel make labels Nov 4, 2024

Merge branch 'main' into rabbitmq-server-12412

734f685

michaelklishin merged commit fe587ae into main Nov 4, 2024
273 checks passed

michaelklishin deleted the rabbitmq-server-12412 branch November 4, 2024 06:20

michaelklishin mentioned this pull request Nov 4, 2024

Backport of #12640 to v4.0.x #12641

Merged

michaelklishin added a commit that referenced this pull request Nov 4, 2024

Merge pull request #12641 from rabbitmq/rabbitmq-server-12640-for-v4.0.x

215f218

Backport of #12640 to v4.0.x

This was referenced Nov 4, 2024

Periodically check for unapplied policies on QQs #12412

Merged

By @LoisSotoLopez: Exclude policy_repair QQ test on mixed versions (backport #12666) #12667

Merged

By @LoisSotoLopez: Exclude policy_repair QQ test on mixed versions #12666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QQs: periodically apply policies if there's a discrepancy between the current and desired policy-driven state #12640

QQs: periodically apply policies if there's a discrepancy between the current and desired policy-driven state #12640

michaelklishin commented Nov 4, 2024 •

edited

Loading

QQs: periodically apply policies if there's a discrepancy between the current and desired policy-driven state #12640

QQs: periodically apply policies if there's a discrepancy between the current and desired policy-driven state #12640

Conversation

michaelklishin commented Nov 4, 2024 • edited Loading

michaelklishin commented Nov 4, 2024 •

edited

Loading