Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cloudwatch] Batch requests to Cloudwatch to reduce cost #279

Closed
wants to merge 9 commits into from

Conversation

tiedotguy
Copy link
Collaborator

This is a continuation of #204, just moved in to a branch on the main repository so I don't need to deal with any permissions issues.

It merges master to the original branch, and addresses my feedback.

cc @JorgenEvens

JorgenEvens and others added 9 commits December 2, 2018 21:50
Cloudwatch is billed per request and by batching requests to Cloudwatch
we can significantly reduce the cost of running statsd with Cloudwatch
Also pulls out viper options to be constants.
Personally I found the thread logic difficult to follow.  This should hopefully
clarify what's going on.  The logic is unchanged, but structured differently.
@wooly
Copy link

wooly commented Nov 29, 2019

Thanks @tiedotguy, will have a look at this next week for you.

@tiedotguy
Copy link
Collaborator Author

Hi all, hope you had a happy and productive (or non-productive!) break.

Have you had an opportunity to give this a spin with a real CW backend?

start := 0
// This is safe to read outside the timerMutex, as only the
// goroutine that created the initial timer can ever be here
<-client.timer.C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't there a race between clearFlushTimer() and this read of client.timer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so:

Only one goroutine (call it "A") can be on line 230 at a time, and it's the goroutine that wrote to the timer. Essentially the flush timer is a mutex - if it's present, then the thing that created it will reach here, otherwise it won't. At line 235 the "flush timer as a mutex" is released.

I've called out in my comment above clearFlushTimer that it's possible for another goroutine (call it "B") to come along and take the data before "A" does, however that is protected by queueMutex, and just results in "A" having nothing to do.

That being said - I refactored this area specifically to make it more clear. Even if it's safe, if it's still not clear, then I need to make it clearer. A reader of the code shouldn't have to think it through in that detail.

If you can confirm or refute my logic, I'll see about making it easier to follow / fixing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have enough time to properly check the code here. It's probably fine if you say so :)

client.queueMutex.Unlock()

// Run actual function
metricData := []*cloudwatch.MetricDatum{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth preallocating this slice by counting metrics in the queue. Just an idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

  • preallocate queue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also may or may not be beneficial to have a slice of values rather than pointers. Of course if they are already pointers when they reach this code then it may not make any difference.

// TODO: Metrics about how many metrics were sent, and how many requests were required. Maybe others.

for _, q := range queue {
go q.callback(errors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If q wasn't a pointer it would have been a data race. See the bottom of the page https://github.com/golang/go/wiki/CommonMistakes#using-goroutines-on-loop-iterator-variables
It's probably worth to refactor-proof it by putting q.callback into a variable and then go-calling it.
Hm, if this is a field and not a method then race is not possible - pointer-receiver is not captured because, well, it's not a method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to be explicit and make sure it's ok.

  • protect q

case <-ctxTest.Done():
after.Stop()
case <-after.C:
require.Fail(t, "test timed out")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a data race. Please see https://golang.org/pkg/testing/#T and golang/go#15758 and golang/go#24680

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An approach to achieve the same result may be to move everything else into goroutines and block on timer in the test goroutine somehow. Then if it fires fail the test. This will not work if the other code also needs to be able to do the same. =\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well bugger. This pattern isn't new in the code base. I'll raise a new issue to refactor it out of the project. It "works" enough for now though, are you ok with leaving this instance in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, np.

}
for i := 0; i < 200; i++ {
select {
case <-ctx.Done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:coneofshame:

  • return

@MovieStoreGuy
Copy link
Contributor

Does this PR still need to remain open @tiedotguy ?

@tiedotguy
Copy link
Collaborator Author

It'd be nice to get it in, but without a response for giving it a real test, I think no :(

We can always revisit it if needed, and there's a non-zero chance that it won't matter with future refactorings anyway.

@tiedotguy tiedotguy closed this Jul 7, 2020
@tiedotguy tiedotguy deleted the feature/reduce-requests branch October 5, 2020 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants