Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progressutil: added mutexes to make go's race detector happy #61

Merged
merged 1 commit into from
May 29, 2016

Conversation

cgonyeo
Copy link
Contributor

@cgonyeo cgonyeo commented May 27, 2016

docker2aci has some race conditions, and go's race detector found some
race-y accesses in progressutil, so this commit wraps some variables
shared between goroutines in mutexes.

current int64
total int64
done bool
doneLock sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically lock is just above protected member

docker2aci has some race conditions, and go's race detector found some
race-y accesses in progressutil, so this commit wraps some variables
shared between goroutines in mutexes.
@cgonyeo
Copy link
Contributor Author

cgonyeo commented May 27, 2016

Let this run for an hour and a half with Go's race detector going:

$(exit 0); while [ $? -eq 0 ]; do ./bin/docker2aci docker://dgonyeo/test; done

Found another issue, fixed it. I think this is good now.

@jonboulle
Copy link
Contributor

@dgonyeo to be clear, did you reproduce the issue(s) we were experiencing
even without the race detector?

Derek Gonyeo notifications@github.com schrieb am Fr., 27. Mai 2016, 23:54:

Let this run for an hour and a half with Go's race detector going:

$(exit 0); while [ $? -eq 0 ]; do ./bin/docker2aci docker://dgonyeo/test; done

Found another issue, fixed it. I think this is good now.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#61 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACewN1lAcv4gSTmSKi1PxYgU92BqQL70ks5qF2gsgaJpZM4Ioxr9
.

@jonboulle
Copy link
Contributor

This doesn't address all the issues but it's a step forward

@jonboulle jonboulle merged commit 30a2735 into coreos:master May 29, 2016
@jonboulle
Copy link
Contributor

#62

jonboulle added a commit to jonboulle/docker2aci that referenced this pull request May 29, 2016
The parallel pull refactoring in
47f2cb9 introduced an unfortunate and
subtle race condition. Because i) we were asynchronously starting the
fetching of each layer [1][1] (which implies that the fetch is added to
the progress bar asynchronously, via getLayerV2 [2][2] [3][3]), and ii)
the progressutil package has no guards against calling PrintAndWait and
AddCopy in indeterminate order [4][4], it was possible for us to enter
the PrintAndWait loop [5][5] [6][6] before all of the layer fetches had
actually been added to it. Then, in the event that the first layer was
particularly fast, the CopyProgressPrinter could actually think it was
done [7][7] before the other layer(s) had finished. In this situation,
docker2aci would happily continue forward and try to generate an ACI
from each layer [8][8], and for any layer(s) that had not actually
finished downloading, the GenerateACI->writeACI->tarball.Walk call
[9][9] would be operating on a partially written file and hence result
in the errors we've been seeing ("unexpected EOF", "corrupt input
before offset 45274", and so forth).

A great case to reproduce this is the `docker:busybox` image because of
its particular characteristics: two layers, one very small (32B) and one
relatively larger (676KB) layer. Typical output looks like the following

```
% $(exit 0); while [ $? -eq 0 ]; do ./bin/docker2aci docker://busybox; done

Downloading sha256:385e281300c [===============================] 676 KB
/ 676 KB
Downloading sha256:a3ed95caeb0 [===============================]     32
B / 32 B

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [==============================]      32
B / 32 B
Downloading sha256:385e281300c [==============================]  676 KB
/ 676 KB

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [===================================] 32
B / 32 B
Error: conversion error: error generating ACI: error writing ACI: Error
reading tar entry: unexpected EOF

```

Note that i) the order in which the layers are registered with the
progress printer is indeterminate, and ii) every failure case (observed
so far) is when the small layer is retrieved first, and the stdout contains no
progress output at all from retrieving the other layer. This indeed
suggests that the progress printer returns before realising the second
layer is still being retrieved.

Tragically, @dgonyeo's test case [10][10] probably gives a false sense
of security (i.e. it cannot reproduce this issue), most likely
because the `dgonyeo/test` image contains a number of layers (5) of
varying sizes, and I suspect it's much less likely for this particular
race to occur.

Almost certainly fixes appc#166 - with this patch I'm unable to reproduce.

[1]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L89
[2]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L115
[3]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L335
[4]: coreos/pkg#63
[5]: https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L123
[6]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L115
[7]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L144
[8]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L149
[9]:
https://github.com/appc/docker2aci/blob/4e051449c0079ba8df59a51c14b7d310de1830b8/lib/internal/internal.go#L427
[10]: coreos/pkg#61 (comment)
jonboulle added a commit to jonboulle/docker2aci that referenced this pull request May 29, 2016
The parallel pull refactoring in
47f2cb9 introduced an unfortunate and
subtle race condition. Because i) we were asynchronously starting the
fetching of each layer [1][1] (which implies that the fetch is added to
the progress bar asynchronously, via getLayerV2 [2][2] [3][3]), and ii)
the progressutil package has no guards against calling PrintAndWait and
AddCopy in indeterminate order [4][4], it was possible for us to enter
the PrintAndWait loop [5][5] [6][6] before all of the layer fetches had
actually been added to it. Then, in the event that the first layer was
particularly fast, the CopyProgressPrinter could actually think it was
done [7][7] before the other layer(s) had finished. In this situation,
docker2aci would happily continue forward and try to generate an ACI
from each layer [8][8], and for any layer(s) that had not actually
finished downloading, the GenerateACI->writeACI->tarball.Walk call
[9][9] would be operating on a partially written file and hence result
in the errors we've been seeing ("unexpected EOF", "corrupt input
before offset 45274", and so forth).

A great case to reproduce this is the `docker:busybox` image because of
its particular characteristics: two layers, one very small (32B) and one
relatively larger (676KB) layer. Typical output looks like the following

```
% $(exit 0); while [ $? -eq 0 ]; do ./bin/docker2aci docker://busybox; done

Downloading sha256:385e281300c [===============================] 676 KB
/ 676 KB
Downloading sha256:a3ed95caeb0 [===============================]     32
B / 32 B

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [==============================]      32
B / 32 B
Downloading sha256:385e281300c [==============================]  676 KB
/ 676 KB

Generated ACI(s):
library-busybox-latest.aci

Downloading sha256:a3ed95caeb0 [===================================] 32
B / 32 B
Error: conversion error: error generating ACI: error writing ACI: Error
reading tar entry: unexpected EOF

```

Note that i) the order in which the layers are registered with the
progress printer is indeterminate, and ii) every failure case (observed
so far) is when the small layer is retrieved first, and the stdout contains no
progress output at all from retrieving the other layer. This indeed
suggests that the progress printer returns before realising the second
layer is still being retrieved.

Tragically, @dgonyeo's test case [10][10] probably gives a false sense
of security (i.e. it cannot reproduce this issue), most likely
because the `dgonyeo/test` image contains a number of layers (5) of
varying sizes, and I suspect it's much less likely for this particular
race to occur.

Almost certainly fixes appc#166 - with this patch I'm unable to reproduce.

[1]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L89
[2]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L115
[3]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L335
[4]: coreos/pkg#63
[5]: https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L123
[6]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L115
[7]:
https://github.com/coreos/pkg/blob/master/progressutil/iocopy.go#L144
[8]:
https://github.com/appc/docker2aci/blob/ba503aa9b84b6c1ffbab03ec0589415ef598e5e0/lib/internal/backend/repository/repository2.go#L149
[9]:
https://github.com/appc/docker2aci/blob/4e051449c0079ba8df59a51c14b7d310de1830b8/lib/internal/internal.go#L427
[10]: coreos/pkg#61 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants