Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2Metadata Client's custom timeout ignores http.DefaultClient customizations post initialization #514

Closed
jasdel opened this issue Jan 18, 2016 · 12 comments
Assignees
Labels
bug This issue is a bug.

Comments

@jasdel
Copy link
Contributor

jasdel commented Jan 18, 2016

In #504 and #511 changes were made to EC2Metadata Client's custom timeout to not disregard the any customizations to http.DefaultTransport, but this change now will ignore any customizations made to http.DefaultClient. In addition to any customizations made after the metadata client has been initialized.

From @pwaller's comment

Aha. So with #511 now the transport is not touched. So (2) no longer applies from the above post. That's a great start.

I see two remaining problems:

  1. The first problem from the above post still applies (you have to re-initialise .Credentials).
  2. If http.DefaultClient is modified, those choices won't be respected, because a new http.Client is constructed from blank rather than copying the http.DefaultClient.
@jasdel jasdel self-assigned this Jan 18, 2016
@jasdel jasdel changed the title EC2Metadata Client's custom timeout ignores http.DefaultClient customizations EC2Metadata Client's custom timeout ignores http.DefaultClient customizations post initialization Jan 18, 2016
@jasdel
Copy link
Contributor Author

jasdel commented Jan 18, 2016

@pwaller, The points you mention are similar to ones we will need to consider in the future how the SDK should introduce custom timeout logic across services and operations for conditions like ServiceUnavailable.

Modifying http.DefaultClient and http.DefaultTransport at runtime seems to easily run the risk of introducing data races. It is preferable to create a custom client and pass that around as needed to ensure data races are not introduced. It is a data race bug to be mutate shared client's parameters while being used to make requests concurrently.

To address this in the most generic way ensuring that the client isn't modified at all. I'm taking a look at if we can take advantage of http.Request.Cancel. This would allow the SDK to cancel a request, of the client's transport supported it. Which actually is similar to http.Client.Timeout's functionality.

@jasdel jasdel added bug This issue is a bug. Review Needed labels Jan 18, 2016
@pwaller
Copy link
Contributor

pwaller commented Jan 19, 2016

@jasdel, only the application itself (and not libraries) should modify the default timeout, and I would expect that any well behaved application would do so before initiating any requests to EC2.

One other consideration: I have seen this timeout fail on occasion on EC2 instances. This is more annoying than if the request had waited longer and eventually succeeded. I would have rather it waited a bit longer and completed, or even that it would have retried a few times.

Maybe one option is to relax the timeout once we know the metadata endpoint is accessible? That way you get both properties: reliability if the machine/metadata endpoint is slow, and the ability to give up on Role based credentials if the metadata endpoint is inaccessible.

@jasdel
Copy link
Contributor Author

jasdel commented Jan 19, 2016

@pwaller Correct, originally this timeout was introduced in order to reduce the timeout delay (about 2min) when running on a non EC2 instance. It provided balance between quicker failures on non-EC2 instances. We could adjust the default timeout to something more appropriate, and the aws.Config.EC2MetadataDisableTimeoutOverride options is available to disable this modification all together.

I agree this shorter timeout should be relaxed or removed once a connection was made.

@arvenil
Copy link

arvenil commented Feb 1, 2016

... and here I'am receiving DATA RACES :/ :D
Also it doesn't make sense to me to modify http.DefaultClient.
A lot of libraries uses http.DefaultClient so a) this introduces data races (as I don't see a way to safely lock http.DefaultClient while modifying it) and b) messes up configuration for other libraries and c) doesn't allow root application to modify defaults.

Removing everything and leaving cfg.HTTPClient = http.DefaultClient gets rid of data races.
Would you accept PR that removes this block of code? (it's worth to try nicely get rid of those data races - if not I will stay with a fork:/)

Here is a part of example data race I've received.

WARNING: DATA RACE
Write by goroutine 8:
  sync/atomic.CompareAndSwapInt32()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/runtime/race_amd64.s:279 +0xb
  sync.(*Mutex).Lock()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/sync/mutex.go:43 +0x4d
  net/http.(*Transport).getIdleConn()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/transport.go:440 +0xc2
  net/http.(*Transport).getConn()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/transport.go:512 +0x96
  net/http.(*Transport).RoundTrip()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/transport.go:228 +0x62a
  net/http.send()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/client.go:220 +0x73d
  net/http.(*Client).send()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/client.go:143 +0x1f7
  net/http.(*Client).doFollowingRedirects()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/client.go:380 +0x1059
  net/http.(*Client).Do()
      /private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/net/http/client.go:178 +0x1e2
(...)

Previous read by goroutine 12:
  github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/ec2metadata.NewClient()
      /Users/kamil/go/src/github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/ec2metadata/service.go:54 +0xb5f
  github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/defaults.CredChain()
      /Users/kamil/go/src/github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/defaults/defaults.go:94 +0x245
  github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/defaults.Get()
      /Users/kamil/go/src/github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/defaults/defaults.go:34 +0xb14
  github.com/a/b/vendor/github.com/aws/aws-sdk-go/aws/session.New()
(...)

@jasdel Thank you for finding this before me!

@arvenil
Copy link

arvenil commented Feb 1, 2016

To be honest I really don't get why set this timeout on default client. It's application decision to set it on http.DefaultClient. I also see you can pass your own httpClient through cfg.HTTPClient so I really don't see a reason why simple

    if cfg.HTTPClient == nil {
        cfg.HTTPClient = http.DefaultClient
    }

isn't a better solution

@jasdel, only the application itself (and not libraries) should modify the default timeout

Why then this library is different?:P

@arvenil
Copy link

arvenil commented Feb 1, 2016

Also the problem is not that other libraries writes, but that they might read http.DefaultClient while AWS library is writing (which also is happening for me).

@arvenil
Copy link

arvenil commented Feb 2, 2016

Wow, looks like I was referring to older version (still had older one) that was without #511
My sincere apologies.
So, now I don't have data races but as the main topic says http.DefaultClient is not used at all.
Maybe something like this could be a better solution that fits everyone.

if httpClientZero(cfg.HTTPClient) {
    if aws.BoolValue(cfg.EC2MetadataDisableHttpDefaultClientOverride) {
        cfg.HTTPClient = http.DefaultClient
    } else {
        // If the http client is unmodified and this feature is not disabled
        // set custom timeouts for EC2Metadata requests.
        cfg.HTTPClient = &http.Client{
            // use a shorter timeout than default because the metadata
            // service is local if it is running, and to fail faster
            // if not running on an ec2 instance.
            Timeout: 5 * time.Second,
        }
    }
}

@arvenil
Copy link

arvenil commented Feb 2, 2016

... doh, nvm my previous comments, with new version, cfg.EC2MetadataDisableHttpDefaultClientOverride works for me, thanks guys! You are amazing! :D

@jasdel
Copy link
Contributor Author

jasdel commented Feb 2, 2016

Thanks for the update @arvenil, That was going to be my first question is you'd sync'ed yet with the latest version. the usage of http.DefaultClient is is actually sourced from aws/defaults#GetConfig which a session.New() will automatically call. This ensures a http client is always provided to the service clients. In addition this is why the SDK wouldn't be able to check for nil of HTTPClient, because in practice it will never be nil.

@jasdel jasdel added bug This issue is a bug. and removed bug This issue is a bug. Review Needed labels Feb 8, 2016
@jasdel
Copy link
Contributor Author

jasdel commented Feb 8, 2016

We've reviewed this issue and think that leaving the EC2Metadata service client mostly as it is. The best way to control the timeouts would be to set the Timeout field on the http.Client passed into the SDK's config, set the timeout on http.DefaultClient, or to disable to timeout override all together with aws#Config.EC2MetadataDisabletimeoutOverride.

The following statement shows how to disable the timeout all together.

sess := session.New(aws.NewConfig().WithEC2MetadataDisabletimeoutOverride(true))

s3Svc := s3.New(sess)

Modifying the http.DefaultClient should be done prior to creating a session. Modifying the http.DefaultClient after sessions, or service clients have been created could introduce race conditions if the SDK were to ever make background request post initialization.

@jasdel jasdel closed this as completed Feb 8, 2016
@pwaller
Copy link
Contributor

pwaller commented Feb 8, 2016

Thanks @jasdel, seems reasonable to me. Did you decide against relaxing the timeout after the first successful attempt?

@jasdel
Copy link
Contributor Author

jasdel commented Feb 8, 2016

We decided not to modify the timeouts for now due to the need to add locking around usage of the Config.HTTPClient. When the SDK looks at adding more comprehensive request timeout logic generically for a per-API operation level I think this will be re-evaluated since how the SDK uses timeouts will most likely change.

skotambkar added a commit to skotambkar/aws-sdk-go that referenced this issue May 20, 2021
* Adds HTTP based de/serialization typed error
* This PR doesn't replace the old usages of Err codes for de/serialization error.
* The serialization/ deserialization error types are generic, and not transport specific. This conforms with layered error design discussed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

3 participants