Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revendor etcd/raft #1353

Merged
merged 1 commit into from
Aug 11, 2016
Merged

Revendor etcd/raft #1353

merged 1 commit into from
Aug 11, 2016

Conversation

aaronlehmann
Copy link
Collaborator

@aaronlehmann aaronlehmann commented Aug 9, 2016

Change Campaign logic to trigger after Advance, because now Campaign
won't do anything if there are uncommitted entries. Also, remove the
check on Progress, because now Progress is only filled in if the node is
already the leader.

Set a smaller WAL segment size to work around high I/O load on Mac when
running unit tests.

Fixes #1229

@codecov-io
Copy link

codecov-io commented Aug 9, 2016

Current coverage is 55.35% (diff: 100%)

Merging #1353 into master will increase coverage by 0.14%

@@             master      #1353   diff @@
==========================================
  Files            80         80          
  Lines         12637      12632     -5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6977       6993    +16   
+ Misses         4699       4679    -20   
+ Partials        961        960     -1   

Sunburst

Powered by Codecov. Last update 77a6894...62e73cd

@dperny
Copy link
Collaborator

dperny commented Aug 9, 2016

while running this branch, got this:

--- FAIL: TestRaftSnapshot (10.76s)
        Error Trace:    testutils.go:104
                        testutils.go:365
                        storage_test.go:67
        Error:          Received unexpected error polling failed: did not find leader in member list

FAIL
exit status 1

@aaronlehmann
Copy link
Collaborator Author

while running this branch, got this:

Was this on Mac? I also saw some issues with the tests due to I/O load, but I have an upstream PR pending that should solve the problem.

@dperny
Copy link
Collaborator

dperny commented Aug 9, 2016

Negative. Ubuntu 12.04 LTS.

@aaronlehmann aaronlehmann force-pushed the revendor-etcd branch 2 times, most recently from 2337dd9 to b03f377 Compare August 10, 2016 00:11
@aaronlehmann
Copy link
Collaborator Author

@dperny: I made a change to the test that may help. Can you check if the problem is still occurring?

@aaronlehmann aaronlehmann changed the title [WIP] Revendor etcd/raft Revendor etcd/raft Aug 10, 2016
@aaronlehmann
Copy link
Collaborator Author

My PR to coreos/etcd that allows tests to work around the preallocation issue was merged. Updated this PR and removed WIP.

@dperny
Copy link
Collaborator

dperny commented Aug 10, 2016

@aaronlehmann, sorry, I should have specified that the failure was intermittent. I got it in one run out of maybe 10.

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 10, 2016

-22k LOC? I'm in.
LGTM

@aaronlehmann
Copy link
Collaborator Author

-22k LOC? I'm in.

Yeah, some of it is from removing dependencies like candiedyaml that we don't use any more. But also etcd switched to a better vendoring system where they aren't rewriting import paths. So we no longer vendor two copies of gogoprotobuf, for example.

Change Campaign logic to trigger after Advance, because now Campaign
won't do anything if there are uncommitted entries. Also, remove the
check on Progress, because now Progress is only filled in if the node is
already the leader.

Set a smaller WAL segment size to work around high I/O load on Mac when
running unit tests.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann
Copy link
Collaborator Author

Rebased after #1327 to adapt to new locking API in this upstream version, and adjust TestGCWAL for the smaller segment size. The test is now enabled in CI.

PTAL

@abronan
Copy link
Contributor

abronan commented Aug 11, 2016

Thanks @aaronlehmann, I'm going to test this with docker integration.

@abronan
Copy link
Contributor

abronan commented Aug 11, 2016

Haven't seen any issue running a cluster with docker integration, removing the progress check seems safe too since we now vendor v3.

On 3 Manager cluster, I tested:

  • promotion/demotion (with self-demotion)
  • stop whole cluster and restart
  • stop whole cluster and restart a subset
  • stop leader
  • some service operations and stress-test while doing all of the above

We should think about using the leadership transfer on self-demotion now that we vendor v3.

LGTM

@aaronlehmann
Copy link
Collaborator Author

Cool. Did you have a chance to run the Docker integration tests? I haven't done this yet.

@aaronlehmann
Copy link
Collaborator Author

Docker integration tests passed. Here goes nothing.

@aaronlehmann aaronlehmann merged commit d5f249e into moby:master Aug 11, 2016
@aaronlehmann aaronlehmann deleted the revendor-etcd branch August 11, 2016 23:56
@aaronlehmann
Copy link
Collaborator Author

@LK4D4: Actually it's -36k LOC - GitHub is not showing correct totals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants