Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/aead: add explicit data race and locking to resolve #77

Merged
merged 2 commits into from
Oct 1, 2018

Conversation

jphines
Copy link
Contributor

@jphines jphines commented Oct 1, 2018

This adds an explicit concurrency test case for aead/miscreant to test for potential data races in the underlying cipher implementation.

Copy link
Contributor

@katzdm katzdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test we could construct that would repro the data race when run with go test -race (and fail against HEAD), which we could then verify is fixed with this patch?

@@ -61,6 +67,11 @@ func (c *MiscreantCipher) Encrypt(plaintext []byte) (joined []byte, err error) {

// Decrypt a value using AES-CMAC-SIV
func (c *MiscreantCipher) Decrypt(joined []byte) ([]byte, error) {
aead, err := miscreant.NewAEAD(algorithmType, c.secret, miscreantNonceSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out that this seems to be a somewhat nontrivial initializer, which performs some block cipher encryption to derive its initial state, on each call.
https://github.com/miscreant/miscreant-go/blob/master/cmac/cmac.go#L45

My assumption is that sso_proxy is not particularly compute-bounded, but it could be interesting to throw this up in stage, and using Datadog to verify that we don't see an undue latency cost from these initializations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose instead of instantiating new objects we could just lock around a single cipher?

I wonder which would be more performant.

@jphines
Copy link
Contributor Author

jphines commented Oct 1, 2018

@katzdm Do you mean something like https://circleci.com/gh/buzzfeed/sso/463 which shows the data race?

@@ -25,17 +25,18 @@ type Cipher interface {
// Using an AEAD is a cipher providing authenticated encryption with associated data.
// For a description of the methodology, see https://en.wikipedia.org/wiki/Authenticated_encryption
type MiscreantCipher struct {
aead cipher.AEAD
secret []byte
}

// NewMiscreantCipher returns a new AES Cipher for encrypting values
func NewMiscreantCipher(secret []byte) (*MiscreantCipher, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a somewhat minor point, but it feels surprising for this type to be named MiscreantCipher, if the object in question (the Miscreant cipher) is only instantiated at Encrypt/Decrypt-time. Maybe it would be better if we keep the cipher.AEAD object owned by the MiscreantCipher class, and instead initialize a new MiscreantCipher during Marshal/Unmarsal. Might make it easier to reason about the types, in the future.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with changing the type name, but I think whatever we want to expose, we want to be go-routine safe. I would argue that this should include making both Encrypt and Decrypt go-routine safe which currently necessitates the instantiation of new objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just creating new objects in the Unmarshal and Marshal such that those are the only two methods that would be go-routine safe isn't sufficient for what we want to expose from this API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the MiscreantCipher is an implementation of the Cipher interface (which all might be poorly named) and has Unmarshal and Marshal as well as Encrypt and Decrypt functions. This probably could be structured differently but we'd just need to reflect those changes in the MockCipher

@@ -89,3 +93,72 @@ func TestMarshalAndUnmarshalStruct(t *testing.T) {
t.Fatalf("expected structs to be equal")
}
}

func TestCipherDataRace(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider annotating with a comment pointing to a GitHub issue describing the problem. Six months from now, we'll all forget why we added this test :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

katzdm
katzdm previously approved these changes Oct 1, 2018
@jphines jphines changed the title [wip] pkg/aead: add explicit data race test pkg/aead: add explicit data race and locking to resolve data race Oct 1, 2018
@jphines jphines changed the title pkg/aead: add explicit data race and locking to resolve data race pkg/aead: add explicit data race and locking to resolve Oct 1, 2018
// TestCipherDataRace exercises a simple concurrency test for the MicscreantCipher.
// In https://github.com/buzzfeed/sso/pull/75 we investigated why, on random occasion,
// unmarshalling session states would fail, triggering users to get kicked out of
// authenticated states. We narrowed our suspicious to a data race that we uncovered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/suspicious/suspicions

Also, I don't think we should describe the miscreant library as "having a data race"; the library just doesn't make any attempt at thread-safety. It might be more accurate to phrase the situation as something like: "We eventually uncovered that sso implicitly assumed the underlying miscreant APIs to be thread-safe; in truth, they are only thread-compatible and require external synchronization."

@jphines jphines force-pushed the jhines-data-race branch 2 times, most recently from bbc1525 to b1beac2 Compare October 1, 2018 18:16
@shrayolacrayon
Copy link

this LGTM, lets squash + rebase and merge?

@jphines
Copy link
Contributor Author

jphines commented Oct 1, 2018

@shrayolacrayon squashed into two commits + rebased.

@jphines jphines merged commit c2a68c1 into master Oct 1, 2018
@jphines jphines deleted the jhines-data-race branch October 1, 2018 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants