-
Notifications
You must be signed in to change notification settings - Fork 224
Intermittent SAML and 2FA Push Notification Timeouts from Okta #298
Comments
Hm, haven't heard anything from our users at Segment. It might be the case that the undocumented API we're using has changed in a subtle way that we're not hitting, like #293 but we're not hitting the same codepaths somehow. Your only recourse is to compare HTTP flows between the two and reverse engineer it :( Could also be network gremlins, as you suggest |
Several folks here have been running into this problem intermittently. #299 does appear to fix it. |
I can confirm that HTTP2 over TLS is present in all of our failure cases, while HTTP1 over TLS is negotiated in all of my success repos. In version 1.0.4, the Even though the Get implementation explicitly requests HTTP/1.1. Line 594 in 98c40a4
The Okta server Server Hello message picks h2. The fix in #299 to explicitly set |
Previously the content length was being calculated based off of an empty uninitialised byte array. With this change it's now calculated off of the actual data array used as the body. Setting the content length to 0 seemed to be causing an issue with recent changes to Okta's infrastructure as noticed in segmentio#298.
I started noticing this problem yesterday and after poking some debug prints into aws-okta I noticed that it was always sending a https://github.com/segmentio/aws-okta/blob/master/lib/okta.go#L599 PR raised: #300 |
Noticing this a lot today, intermittently across different AWS profiles and Okta instances. |
How consistent is this for folks? I.e., what percentage of auth attempts timeout? I haven't been able to repro personally, but have had a handful of users at Segment say they are affected. |
For me personally I've had it be super consistent for an hour (100%) and then stop entirely for hours although there's been a bit of inconsistentency on which bit of the flow times out (first request vs MFA push etc). Similar reports from many of my colleagues (but equally some colleagues entirely unaffected). On any occasion it's broken I've had saml2aws work entirely fine (which is using the same underlying API) and even manually stepped through the sign-in flow fine with curl. |
Having some trouble with our CI publishing pipeline, but the tag v1.0.5 is there. That's enough for Homebrew. If somebody else could submit a PR there, that'd be great (we don't use it). I'm working on getting the binaries and packages published to our GH Releases. Update: it's just the linux binary that's failed to publish. The RPMs and DEBs are live on packagecloud. |
Big shoutout to @Chippiewill and @mvallaly-rally for their PRs. |
Thanks for getting this fixed so fast. |
I raised Homebrew/homebrew-core#61790 to bump the version in Homebrew. |
Amazing community effort to get this bug root caused and fixed quickly. @nickatsegment, even with partial deprecation, you've got a large and caring userbase. |
Great work! Any progress on getting the assets pushed in to the 1.0.5 github release? |
@seanorama See #301 (comment). TLDR: no. |
* Calculate OktaClient Content-Length correctly (segmentio#300) Fixes: segmentio#298 * Update issue templates * Fix cred process expiration (segmentio#303) * Added Ubuntu 2020 (Focal) to Makefile.release (segmentio#304) * disable github releases (currently broken) (segmentio#305) Co-authored-by: Will Gardner <willg@rdner.io> Co-authored-by: Nick Irvine <nick@segment.com> Co-authored-by: Zoltán Reegn <zoltan.reegn@gmail.com> Co-authored-by: Yossi Eliaz <zozo123@users.noreply.github.com>
* Calculate OktaClient Content-Length correctly (segmentio#300) Fixes: segmentio#298 * Update issue templates * Fix cred process expiration (segmentio#303) * Added Ubuntu 2020 (Focal) to Makefile.release (segmentio#304) * disable github releases (currently broken) (segmentio#305) * Update AWS Go SDK To v1.25.35 (segmentio#307) Fixes STS regional endpoint support. * Add STS Regional Endpoint Support To Other STS Clients (segmentio#308) * Update keyring to v1.1.6 (segmentio#309) Recent versions of kwallet have removed the old support for the kde4 compatible kwallet dbus interface. This means newer kde5 based OS installs (e.g. kubuntu 20.04) can no longer use the kwallet backend with aws-okta. This was fixed upstream in the keyring lib back in 2019 but the dependency hasn't been bumped since then. Co-authored-by: Will Gardner <willg@rdner.io> Co-authored-by: Nick Irvine <nick@segment.com> Co-authored-by: Zoltán Reegn <zoltan.reegn@gmail.com> Co-authored-by: Yossi Eliaz <zozo123@users.noreply.github.com> Co-authored-by: Andrew Babichev <andrew.babichev@gmail.com>
Posting to see if any other users are having similar issues. Since 9/24/20 several of our users are getting HTTP timeouts waiting for responses from Okta. The behavior is inconsistent, but it takes 3 forms:
Initial SAML Authn never responds. No MFA notification.
getting creds via SAML: Failed to authenticate with okta. If your credentials have changed, use 'aws-okta add': &url.Error{Op:"Post", URL:"https://convoy.okta.com/api/v1/authn", Err:(*http.httpError)(0xc0002fc0a0)}
MFA prompt, but Push notification never shows up on mobile device. Eventual error:
getting creds via SAML: Post "https://[company].okta.com/api/v1/authn/factors/[id]/verify?rememberDevice=true": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
MFA notification arrives on phone and I complete it but
aws-okta
doesn’t seem to register it and it fails with timeout.getting creds via SAML: Failed authn verification for okta. Err: Post "https://[company].okta.com/api/v1/authn/factors/[id]/verify": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Right now, my assumption is this is a network level issue on Okta's servers, but it's only affecting
aws-okta
. Okta GUI authentication works fine.The text was updated successfully, but these errors were encountered: