Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix condition in disk_log_impl::have_segments_to_evict() #22686

Merged
merged 2 commits into from
Aug 19, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Aug 1, 2024

Currently, we check that there is more than one segment in a disk_log_impl's segment_set for determining it as having segments for eviction.

This gets us into a situation in which we can evict all segments in a log if we were to have _segs.size() > 1, but none if _segs.size() == 1.

This is an overly restrictive condition, and is corrected here.

We openly admit that we may delete the active segment here in disk_log_impl::do_truncate_prefix(), so there is no reason to be so restrictive with the condition in request_eviction_until_offset()

Fixes #18014.

This change has been reverted in #22995. Previous release note:

  • Fixes a condition in disk_log_impl that resulted in the log_eviction_stm being unable to evict from a log with a single segment in it.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Aug 1, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/52371#01910fc0-895f-46d9-8a08-2024b51e6b6e:

"rptest.tests.retention_policy_test.BogusTimestampTest.test_bogus_timestamps.mixed_timestamps=True.use_broker_timestamps=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/52371#01910fc2-00b8-4d92-939d-9e990853c4c4:

"rptest.tests.retention_policy_test.BogusTimestampTest.test_bogus_timestamps.mixed_timestamps=True.use_broker_timestamps=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/52371#019110c9-00b0-44fb-b4b6-11e5687f3e03:

"rptest.tests.retention_policy_test.BogusTimestampTest.test_bogus_timestamps.mixed_timestamps=True.use_broker_timestamps=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/52371#019110c9-009c-480f-af66-4750ac50dfee:

"rptest.tests.retention_policy_test.BogusTimestampTest.test_bogus_timestamps.mixed_timestamps=True.use_broker_timestamps=False"

@vbotbuildovich
Copy link
Collaborator

@mmaslankaprv
Copy link
Member

@WillemKauf would it be possible to modify one of the e2e tests to trigger eviction while appending to the log, to test the race conditions ?

Currently, we check that there is more than one segment in a `disk_log_impl`'s
`segment_set` for determining it as having segments for eviction.

This gets us into a situation in which we can evict all segments in a log if
we were to have `_segs.size() > 1`, but none if `_segs.size() == 1`.

This is an overly restrictive condition, and is corrected here.

Also, add a new test to `log_retention_tests.cc`, `retention_test_size_with_one_segment`
to ensure test coverage for this new condition.

Also, adjust `retention_ms` in `test_bogus_timestamps`.
A low `retention_ms` value in this test case quickly evicts the
lone active segment with a valid timestamp, leading it to timeout when checking
for `"Adjusting retention timestamp.*to max valid record"` in the log
monitor in the case of `mixed_timestamps=True, use_broker_timestamps=False`.
Raise its value to `10,000` (`10s`) to avoid this issue.
To test that we do not encounter data races or other issues with concurrent
threads appending to and evicting from a `log`.
@WillemKauf WillemKauf force-pushed the evict_fix_disk_log_impl branch from 8213554 to fd55288 Compare August 9, 2024 01:03
@WillemKauf
Copy link
Contributor Author

@mmaslankaprv added a test covering concurrent eviction and appendings, similar to the one you have here.

I also checked that test in locally and ran it successfully a number of times with my change.

LMKWYT

@mmaslankaprv mmaslankaprv merged commit e2117d1 into redpanda-data:dev Aug 19, 2024
21 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Aug 22, 2024
…vict()`

Original PR: redpanda-data#22686.

The associated test with this is deadlocking in every ~1/30-40 runs.
Revert the change for now.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Aug 22, 2024
…vict()`

Original PR: redpanda-data#22686.

The associated test with this is deadlocking in every ~1/30-40 runs.
Revert the change for now, as well as the expectations for unit test
`retention_test_size_with_one_segment`.
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Aug 22, 2024
…vict()`

Original PR: redpanda-data#22686.

The associated test with this is deadlocking in every ~1/30-40 runs.
Revert the change for now, as well as the expectations for unit test
`retention_test_size_with_one_segment`.

(cherry picked from commit 6fe0605)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Aug 22, 2024
…vict()`

Original PR: redpanda-data#22686.

The associated test with this is deadlocking in every ~1/30-40 runs.
Revert the change for now, as well as the expectations for unit test
`retention_test_size_with_one_segment`.

(cherry picked from commit 6fe0605)
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Sep 26, 2024
…ions

For reasons not well understood each partition needs to have at least 2
segments for eviction to be triggered. We have tried to change this
condition in redpanda-data#22686 but
had to revert it in
redpanda-data#22995.

For now, produce more data to increase the chance of each partition
having at least 2 segments worth of data.
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Sep 26, 2024
…ions

For reasons not well understood each partition needs to have at least 2
segments for eviction to be triggered. We have tried to change this
condition in redpanda-data#22686 but
had to revert it in
redpanda-data#22995.

For now, produce more data to increase the chance of each partition
having at least 2 segments worth of data.

(cherry picked from commit 79d706e)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Sep 26, 2024
…ions

For reasons not well understood each partition needs to have at least 2
segments for eviction to be triggered. We have tried to change this
condition in redpanda-data#22686 but
had to revert it in
redpanda-data#22995.

For now, produce more data to increase the chance of each partition
having at least 2 segments worth of data.

(cherry picked from commit 79d706e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (Data never drops below 1MB) in FullDiskReclaimTest.test_full_disk_triggers_gc
4 participants