Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test datalake with recovery mode and disabled partitions #24446

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Dec 5, 2024

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Disable datalake services in recovery mode

@dotnwat
Copy link
Member

dotnwat commented Dec 5, 2024

/ci-repeat 5
skip-units
dt-repeat=100
tests/rptest/tests/datalake/recovery_mode_test.py

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 5, 2024

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59319#01939901-3ee1-4f93-b9c7-9a656977bebb:

"rptest.tests.datalake.recovery_mode_test.DatalakeRecoveryModeTest.test_disabled_partitions.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59319#01939901-3ee0-4d2d-a5fe-3b0edf16105b:

"rptest.tests.datalake.recovery_mode_test.DatalakeRecoveryModeTest.test_disabled_partitions.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59319#01939905-17d7-4520-a24f-0ba7df24d931:

"rptest.tests.datalake.recovery_mode_test.DatalakeRecoveryModeTest.test_disabled_partitions.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=True"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59319#01939901-3ee2-4b3d-b9a7-4e9a669e9249:

"rptest.tests.datalake.recovery_mode_test.DatalakeRecoveryModeTest.test_disabled_partitions.cloud_storage_type=CloudStorageType.S3.filesystem_catalog_mode=False"

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 5, 2024

Retry command for Build#59319

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/datalake/recovery_mode_test.py::DatalakeRecoveryModeTest.test_disabled_partitions@{"cloud_storage_type":1,"filesystem_catalog_mode":false}
tests/rptest/tests/datalake/recovery_mode_test.py::DatalakeRecoveryModeTest.test_disabled_partitions@{"cloud_storage_type":1,"filesystem_catalog_mode":true}

@ztlpn
Copy link
Contributor Author

ztlpn commented Dec 6, 2024

Some

rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-2(1) example="ERROR 2024-12-05 23:16:14,283 [shard 1:data] s3 - util.cc:108 - Unexpected error seastar::nested_exception: std::__1::system_error (error system:32, sendmsg: Broken pipe) (while cleaning up after std::__1::system_error (error system:32, sendmsg: Broken pipe))">

Otherwise no errors.

@dotnwat
Copy link
Member

dotnwat commented Dec 6, 2024

looks like a merge conflict in application.cc

@ztlpn ztlpn force-pushed the iceberg-test-recovery-mode branch from 2b2a6de to 80d3ac2 Compare December 9, 2024 12:30
self.redpanda.restart_nodes(
random.sample(self.redpanda.nodes, 1),
override_cfg_params={"recovery_mode_enabled": True})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe produce to the topic to make it more likely that there's more data that would be scheduled if not for recovery on the given node? And maybe again to "foo" and "bar" after enabling recovery mode for the cluster? At that point, maybe we could also check that the table doesn't grow before and after the sleep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in the next test -- or does recovery mode prevent us from producing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does recovery mode prevent us from producing?

Yes exactly. I thought about adding some checks that the table doesn't grow in recovery mode, but they would be either trivial or inherently flaky.

@ztlpn ztlpn requested a review from andrwng December 11, 2024 12:22
@ztlpn ztlpn merged commit 942f964 into redpanda-data:dev Dec 11, 2024
17 checks passed
@ztlpn ztlpn deleted the iceberg-test-recovery-mode branch December 11, 2024 21:33
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24446-v24.3.x-539 remotes/upstream/v24.3.x
git cherry-pick -x 7cfbcb05d2 159a2722d7 80d3ac2076

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants