-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/aead: add explicit data race and locking to resolve #77
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a unit test we could construct that would repro the data race when run with go test -race
(and fail against HEAD
), which we could then verify is fixed with this patch?
internal/pkg/aead/aead.go
Outdated
@@ -61,6 +67,11 @@ func (c *MiscreantCipher) Encrypt(plaintext []byte) (joined []byte, err error) { | |||
|
|||
// Decrypt a value using AES-CMAC-SIV | |||
func (c *MiscreantCipher) Decrypt(joined []byte) ([]byte, error) { | |||
aead, err := miscreant.NewAEAD(algorithmType, c.secret, miscreantNonceSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth pointing out that this seems to be a somewhat nontrivial initializer, which performs some block cipher encryption to derive its initial state, on each call.
https://github.com/miscreant/miscreant-go/blob/master/cmac/cmac.go#L45
My assumption is that sso_proxy
is not particularly compute-bounded, but it could be interesting to throw this up in stage
, and using Datadog to verify that we don't see an undue latency cost from these initializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose instead of instantiating new objects we could just lock around a single cipher?
I wonder which would be more performant.
@katzdm Do you mean something like https://circleci.com/gh/buzzfeed/sso/463 which shows the data race? |
internal/pkg/aead/aead.go
Outdated
@@ -25,17 +25,18 @@ type Cipher interface { | |||
// Using an AEAD is a cipher providing authenticated encryption with associated data. | |||
// For a description of the methodology, see https://en.wikipedia.org/wiki/Authenticated_encryption | |||
type MiscreantCipher struct { | |||
aead cipher.AEAD | |||
secret []byte | |||
} | |||
|
|||
// NewMiscreantCipher returns a new AES Cipher for encrypting values | |||
func NewMiscreantCipher(secret []byte) (*MiscreantCipher, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a somewhat minor point, but it feels surprising for this type to be named MiscreantCipher
, if the object in question (the Miscreant cipher) is only instantiated at Encrypt
/Decrypt
-time. Maybe it would be better if we keep the cipher.AEAD
object owned by the MiscreantCipher
class, and instead initialize a new MiscreantCipher
during Marshal
/Unmarsal
. Might make it easier to reason about the types, in the future.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with changing the type name, but I think whatever we want to expose, we want to be go-routine
safe. I would argue that this should include making both Encrypt
and Decrypt
go-routine safe which currently necessitates the instantiation of new objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just creating new objects in the Unmarshal
and Marshal
such that those are the only two methods that would be go-routine safe isn't sufficient for what we want to expose from this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the MiscreantCipher
is an implementation of the Cipher
interface (which all might be poorly named) and has Unmarshal
and Marshal
as well as Encrypt
and Decrypt
functions. This probably could be structured differently but we'd just need to reflect those changes in the MockCipher
502ca31
to
48d267d
Compare
@@ -89,3 +93,72 @@ func TestMarshalAndUnmarshalStruct(t *testing.T) { | |||
t.Fatalf("expected structs to be equal") | |||
} | |||
} | |||
|
|||
func TestCipherDataRace(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider annotating with a comment pointing to a GitHub issue describing the problem. Six months from now, we'll all forget why we added this test :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
internal/pkg/aead/aead_test.go
Outdated
// TestCipherDataRace exercises a simple concurrency test for the MicscreantCipher. | ||
// In https://github.com/buzzfeed/sso/pull/75 we investigated why, on random occasion, | ||
// unmarshalling session states would fail, triggering users to get kicked out of | ||
// authenticated states. We narrowed our suspicious to a data race that we uncovered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/suspicious/suspicions
Also, I don't think we should describe the miscreant library as "having a data race"; the library just doesn't make any attempt at thread-safety. It might be more accurate to phrase the situation as something like: "We eventually uncovered that sso
implicitly assumed the underlying miscreant APIs to be thread-safe; in truth, they are only thread-compatible and require external synchronization."
bbc1525
to
b1beac2
Compare
this LGTM, lets squash + rebase and merge? |
b1beac2
to
63af567
Compare
@shrayolacrayon squashed into two commits + rebased. |
This adds an explicit concurrency test case for aead/miscreant to test for potential data races in the underlying cipher implementation.