Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #11652 #11670 #11710 #11752

Conversation

tangcong
Copy link
Contributor

@tangcong tangcong commented Apr 6, 2020

Cherry pick of #11652 #11670 #11710 on release-3.4.

#11652: *: fix auth revision corruption bug
#11670: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency
#11710: auth: ensure RoleGrantPermission is compatible with older

@tangcong
Copy link
Contributor Author

tangcong commented Apr 6, 2020

@jingyih @gyuho After listening to your suggestions(Refactored code doesn't need be backported),i have submitted two cherry-pick pr #11752 (release-3.4), #11753 (release-3.3) and also updated changlog #11750. PTAL. thanks.

@jingyih
Copy link
Contributor

jingyih commented Apr 6, 2020

Could you help take a look at the failed e2e test?

=== RUN   TestCtlV3AuthRoleGet
--- FAIL: TestCtlV3AuthRoleGet (15.76s)
    testutil.go:55: goroutine 734 [running]:
        go.etcd.io/etcd/pkg/testutil.FatalStack(0xc00022c300, 0xc0001bb8c0, 0x18)
        	/go/src/go.etcd.io/etcd/pkg/testutil/testutil.go:54 +0x6d
        go.etcd.io/etcd/tests/e2e.testCtl(0xc00022c300, 0xd4d0f0, 0x0, 0x0, 0x0)
        	/go/src/go.etcd.io/etcd/tests/e2e/ctl_v3_test.go:184 +0x431
        go.etcd.io/etcd/tests/e2e.TestCtlV3AuthRoleGet(0xc00022c300)
        	/go/src/go.etcd.io/etcd/tests/e2e/ctl_v3_auth_test.go:55 +0x48
        testing.tRunner(0xc00022c300, 0xd4ca28)
        	/usr/local/go/src/testing/testing.go:865 +0xc0
        created by testing.(*T).Run

@tangcong
Copy link
Contributor Author

tangcong commented Apr 6, 2020

Could you help take a look at the failed e2e test?

=== RUN   TestCtlV3AuthRoleGet
--- FAIL: TestCtlV3AuthRoleGet (15.76s)
    testutil.go:55: goroutine 734 [running]:
        go.etcd.io/etcd/pkg/testutil.FatalStack(0xc00022c300, 0xc0001bb8c0, 0x18)
        	/go/src/go.etcd.io/etcd/pkg/testutil/testutil.go:54 +0x6d
        go.etcd.io/etcd/tests/e2e.testCtl(0xc00022c300, 0xd4d0f0, 0x0, 0x0, 0x0)
        	/go/src/go.etcd.io/etcd/tests/e2e/ctl_v3_test.go:184 +0x431
        go.etcd.io/etcd/tests/e2e.TestCtlV3AuthRoleGet(0xc00022c300)
        	/go/src/go.etcd.io/etcd/tests/e2e/ctl_v3_auth_test.go:55 +0x48
        testing.tRunner(0xc00022c300, 0xd4ca28)
        	/usr/local/go/src/testing/testing.go:865 +0xc0
        created by testing.(*T).Run

semaphoreci is flaky. travis ci seems ok. can it be retested? local test is ok.

@jingyih
Copy link
Contributor

jingyih commented Apr 6, 2020

semaphore ci and travis ci are actually testing different sets of tests. I have not noticed this particular test been flaky in the past. Plus, the test is auth related. Therefore I asked, just want to make sure.

@tangcong
Copy link
Contributor Author

tangcong commented Apr 6, 2020

semaphore ci and travis ci are actually testing different sets of tests. I have not noticed this particular test been flaky in the past. Plus, the test is auth related. Therefore I asked, just want to make sure.

got it thanks. semaphore ci failed to pass due to test timeout. I checked again without changing the auth role logic. I can't trigger it to retest.

@tangcong tangcong closed this Apr 6, 2020
@tangcong tangcong reopened this Apr 6, 2020
@tangcong tangcong force-pushed the automated-cherry-pick-of-#11652-#11670-#11710-origin-release-3.4 branch from 34aa9bf to 9bb4d5a Compare April 7, 2020 15:37
@jingyih
Copy link
Contributor

jingyih commented Apr 7, 2020

semaphore ci and travis ci are actually testing different sets of tests. I have not noticed this particular test been flaky in the past. Plus, the test is auth related. Therefore I asked, just want to make sure.

got it thanks. semaphore ci failed to pass due to test timeout. I checked again without changing the auth role logic. I can't trigger it to retest.

I do not know how to trigger retest neither. Did you try with the same test command as semaphore ci? e.g. sudo HOST_TMP_DIR=/tmp TEST_OPTS="GOARCH=386 PASSES='build e2e'" make docker-test

@tangcong
Copy link
Contributor Author

tangcong commented Apr 7, 2020

semaphore ci and travis ci are actually testing different sets of tests. I have not noticed this particular test been flaky in the past. Plus, the test is auth related. Therefore I asked, just want to make sure.

got it thanks. semaphore ci failed to pass due to test timeout. I checked again without changing the auth role logic. I can't trigger it to retest.

I do not know how to trigger retest neither. Did you try with the same test command as semaphore ci? e.g. sudo HOST_TMP_DIR=/tmp TEST_OPTS="GOARCH=386 PASSES='build e2e'" make docker-test

thanks. i will take a try. It seems that this PR merge 3.4 will sometimes cause auth operation timeout, and it looks weird, I need to investigate further.

@tangcong tangcong force-pushed the automated-cherry-pick-of-#11652-#11670-#11710-origin-release-3.4 branch 2 times, most recently from f8df6b6 to 2bc5116 Compare April 8, 2020 03:07
@tangcong
Copy link
Contributor Author

tangcong commented Apr 8, 2020

@jingyih zap logger may be nil in release-3.4 so it will cause etcd crash and command timeout. fixed it. your judgment is accurate, thanks.

@jingyih
Copy link
Contributor

jingyih commented Apr 8, 2020

@jingyih zap logger may be nil in release-3.4 so it will cause etcd crash and command timeout. fixed it. your judgment is accurate, thanks.

Good catch! Thanks! I am thinking if there is a way to auto check for this kind of bug in the future when we backport. Do you have any suggestion? Basically 3.5+ eliminated nil checking on logger, but 3.4 still needs it.

auth/store.go Outdated Show resolved Hide resolved
@tangcong tangcong force-pushed the automated-cherry-pick-of-#11652-#11670-#11710-origin-release-3.4 branch from 2bc5116 to b733b22 Compare April 9, 2020 01:34
@tangcong
Copy link
Contributor Author

tangcong commented Apr 9, 2020

Good catch! Thanks! I am thinking if there is a way to auto check for this kind of bug in the future when we backport. Do you have any suggestion? Basically 3.5+ eliminated nil checking on logger, but 3.4 still needs it.

it needs a tool such as error check. can we modify cherry-pick script to add some tips?

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jingyih
Copy link
Contributor

jingyih commented Apr 9, 2020

@gyuho @mitake @spzala This cherrypick is not trivial. It would be great if any of you could also help take a look.

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,42 @@
// Copyright 2015 The etcd Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, this is a new file, we should have used year 2020 here (instead of 2015) for an easy future reference of when the file was created. Considering this is cherry pick, doing a separate PR in the master and then cherry pick it would be nice. Thanks!

@spzala
Copy link
Member

spzala commented Apr 9, 2020

@gyuho @mitake @spzala This cherrypick is not trivial. It would be great if any of you could also help take a look.

@jingyih awesome job, as usual :), reviewing the original changes and @tangcong great job with the PR. I had a quick look and a small inline comment that can be addressed separately. The build failure seems not related. From cherry pick perspective it looks good to me. Thanks both!

@jingyih jingyih merged commit f1eca4e into etcd-io:release-3.4 Apr 10, 2020
@tangcong tangcong deleted the automated-cherry-pick-of-#11652-#11670-#11710-origin-release-3.4 branch February 26, 2021 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants