-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cloudwatch] Batch requests to Cloudwatch to reduce cost #279
Conversation
Cloudwatch is billed per request and by batching requests to Cloudwatch we can significantly reduce the cost of running statsd with Cloudwatch
Also pulls out viper options to be constants.
Personally I found the thread logic difficult to follow. This should hopefully clarify what's going on. The logic is unchanged, but structured differently.
Thanks @tiedotguy, will have a look at this next week for you. |
Hi all, hope you had a happy and productive (or non-productive!) break. Have you had an opportunity to give this a spin with a real CW backend? |
start := 0 | ||
// This is safe to read outside the timerMutex, as only the | ||
// goroutine that created the initial timer can ever be here | ||
<-client.timer.C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't there a race between clearFlushTimer()
and this read of client.timer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so:
Only one goroutine (call it "A") can be on line 230 at a time, and it's the goroutine that wrote to the timer. Essentially the flush timer is a mutex - if it's present, then the thing that created it will reach here, otherwise it won't. At line 235 the "flush timer as a mutex" is released.
I've called out in my comment above clearFlushTimer
that it's possible for another goroutine (call it "B") to come along and take the data before "A" does, however that is protected by queueMutex
, and just results in "A" having nothing to do.
That being said - I refactored this area specifically to make it more clear. Even if it's safe, if it's still not clear, then I need to make it clearer. A reader of the code shouldn't have to think it through in that detail.
If you can confirm or refute my logic, I'll see about making it easier to follow / fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have enough time to properly check the code here. It's probably fine if you say so :)
client.queueMutex.Unlock() | ||
|
||
// Run actual function | ||
metricData := []*cloudwatch.MetricDatum{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth preallocating this slice by counting metrics in the queue. Just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
- preallocate queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also may or may not be beneficial to have a slice of values rather than pointers. Of course if they are already pointers when they reach this code then it may not make any difference.
// TODO: Metrics about how many metrics were sent, and how many requests were required. Maybe others. | ||
|
||
for _, q := range queue { | ||
go q.callback(errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If q
wasn't a pointer it would have been a data race. See the bottom of the page https://github.com/golang/go/wiki/CommonMistakes#using-goroutines-on-loop-iterator-variables
It's probably worth to refactor-proof it by putting q.callback
into a variable and then go-calling it.
Hm, if this is a field and not a method then race is not possible - pointer-receiver is not captured because, well, it's not a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to be explicit and make sure it's ok.
- protect
q
case <-ctxTest.Done(): | ||
after.Stop() | ||
case <-after.C: | ||
require.Fail(t, "test timed out") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a data race. Please see https://golang.org/pkg/testing/#T and golang/go#15758 and golang/go#24680
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An approach to achieve the same result may be to move everything else into goroutines and block on timer in the test goroutine somehow. Then if it fires fail the test. This will not work if the other code also needs to be able to do the same. =\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well bugger. This pattern isn't new in the code base. I'll raise a new issue to refactor it out of the project. It "works" enough for now though, are you ok with leaving this instance in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, np.
} | ||
for i := 0; i < 200; i++ { | ||
select { | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:coneofshame:
- return
Does this PR still need to remain open @tiedotguy ? |
It'd be nice to get it in, but without a response for giving it a real test, I think no :( We can always revisit it if needed, and there's a non-zero chance that it won't matter with future refactorings anyway. |
This is a continuation of #204, just moved in to a branch on the main repository so I don't need to deal with any permissions issues.
It merges master to the original branch, and addresses my feedback.
cc @JorgenEvens