Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.3] backport failpoints(resizeFileError, lackOfDiskSpace) and dmflakey on XFS #816

Merged
merged 4 commits into from
Aug 9, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Aug 9, 2024

ahrtr and others added 4 commits August 9, 2024 09:27
Signed-off-by: Benjamin Wang <wachao@vmware.com>
(cherry picked from commit 465077b)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Marcondes Viana <marju10@gmail.com>
(cherry picked from commit 5ddbd0c)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Update case with a combination of EXT4 filesystem's commit setting and
unexpected exit event. That EXT4 filesystem's commit is to sync all its data
and metadata every seconds. The kernel can help us sync even if that
process has been killed. With different commit setting, we can simulate
that case that kernel syncs half part of dirty pages before power
failure. And for unexpected exit event, we can kill that process
randomly or panic at failpoint instead of fixed code path.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 4c3a80b)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
This also introduces mkfs options, in case we need to accomodate for
non-default parameters here in the future.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
(cherry picked from commit c27eedc)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
@ahrtr
Copy link
Member

ahrtr commented Aug 9, 2024

cc @tjungblu

@tjungblu
Copy link
Contributor

tjungblu commented Aug 9, 2024

/lgtm

thank you @fuweid

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ahrtr, fuweid

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr merged commit 1b38fb3 into etcd-io:release-1.3 Aug 9, 2024
10 checks passed
@ahrtr
Copy link
Member

ahrtr commented Aug 9, 2024

@fuweid the robustness test failed right after merging this PR, can you take a look? thx
https://github.com/etcd-io/bbolt/actions/runs/10316542012/job/28558983342

cc @tjungblu

@fuweid
Copy link
Member Author

fuweid commented Aug 9, 2024

@fuweid the robustness test failed right after merging this PR, can you take a look? thx https://github.com/etcd-io/bbolt/actions/runs/10316542012/job/28558983342

cc @tjungblu

=== RUN TestRestartFromPowerFailureExt4/fp_ext4_commit1s
powerfailure_test.go:168: start bbolt bench -work -path /tmp/TestRestartFromPowerFailureExt4fp_ext4_commit1s370634247/002/boltdb -count=1000000000 -batch-size=5
powerfailure_test.go:185: simulate power failure
powerfailure_test.go:190: random pick failpoint: mapError
powerfailure_test.go:199: bbolt should stop with panic in seconds

The bbolt command wasn't panic in 10 seconds.

@fuweid
Copy link
Member Author

fuweid commented Aug 9, 2024

I need to backport this one as well

49eb212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants