Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExpiredTokenException #49

Closed
boivie-at-sony opened this issue Apr 13, 2016 · 12 comments
Closed

ExpiredTokenException #49

boivie-at-sony opened this issue Apr 13, 2016 · 12 comments
Labels
Milestone

Comments

@boivie-at-sony
Copy link

I have setup the KPL to post to Kinesis using an assumed role by using STSAssumeRoleSessionCredentialsProvider as credentials provider.

Now, this worked well for some time, but I ended up after a few hours with these errors:

17:28:54.774 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=42, attempts:
Delay after prev attempt: 1508 ms, Duration: 4 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
17:28:54.774 [kpl-callback-pool-0-thread-0] ERROR c.s.bui.kinesis.KinesisOutput - Exception while posting to kinesis
com.amazonaws.services.kinesis.producer.UserRecordFailedException: null
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler.onPutRecordResult(KinesisProducer.java:188) [amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler.access$000(KinesisProducer.java:127) [amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler$1.run(KinesisProducer.java:134) [amazon-kinesis-producer-0.10.2.jar:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72-internal]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_72-internal]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72-internal]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72-internal]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-internal]

Does the Java code renew credentials often enough and hand them over to the native daemon?

@boivie-at-sony
Copy link
Author

Hmm... I think this can be attributed to clock drift. Closing it.

@boivie-at-sony
Copy link
Author

Now it has happened again on a production system, and it was definitely not clock drift:

[2016-06-02 03:48:40.434936] [0x00007fba4c9d9700] [error] [retrier.cc:59] PutRecords failed: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
03:48:40.439 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=42, attempts:
Delay after prev attempt: 1501 ms, Duration: 8 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
03:48:40.439 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=music-prod, attempts:
Delay after prev attempt: 1501 ms, Duration: 8 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}

It doesn't recover even after a few hours.

As said earlier, we're using the STSAssumeRoleSessionCredentialsProvider to get temporary and time limited credentials, and I know IAM is very picky about refreshing them before they expire.

Is this really supported well enough?

@boivie-at-sony boivie-at-sony reopened this Jun 2, 2016
@alexmnyc
Copy link

bump. happens in 0.12.1 with DefaultCredentialsProvider

@jeremysears
Copy link
Contributor

I also see this in 0.12.1 with the DefaultCredentialsProvider, while including the STS libs in the classpath.

@pfifer
Copy link
Contributor

pfifer commented Jan 17, 2017

Thanks for reporting this. It appears that when the credentials are being refreshed those aren't making into the client.

@jeremysears do you mean the DefaultAWSCredentialsProviderChain?

My suspicion is that the code that handles sending the credentials to the native component isn't seeing the credential change, allowing the credentials to expire. I still need to investigate this some more.

Can everyone who is seeing this tell me what type of credentials are you using:

  • Instance Profile
  • STS Credentials
  • ECS Container Credentials
  • Other Credentials

@jeremysears
Copy link
Contributor

Yes, the DefaultAWSCredentialsProviderChain. I'm not sure if this is related, but I included the STS libs so that I could run some utilities locally w/ the role we use on our servers. However, I see the error on our servers, where we're using EC2 Container Credentials (no STS, but the libs are now available). W/o the STS lib available in the classpath we don't see this issue on our servers, w/ no other code changes.

@pfifer
Copy link
Contributor

pfifer commented Jan 18, 2017

I think I have an idea of the cause. Both the ECS Credentials, and the STS Credentials have the possibility of throwing an exception if something goes wrong on their remote side. If my suspicion is correct the credential update thread is getting killed when an unhandled exception occurs.

From the reports it sounds like this doesn't happen all the time, but does happen occasionally. I'll see about adding some checks around around the credential retrieval, and providing some type of auto restart should the thread be killed.

@jeremysears
Copy link
Contributor

I also see this when the STS libs are not included in the classpath.

@pfifer
Copy link
Contributor

pfifer commented Feb 15, 2017

@jeremysears If my guess is correct it's not related to the STS credentials, but to any credentials that can throw an exception on retrieval. Regardless of whether this is the cause, strengthening the credential refresh thread would be beneficial.

Could those affected please comment, or add a reaction to assist us in prioritizing the change.

Thanks

@pfifer pfifer added the bug label Feb 15, 2017
@jmmirick
Copy link

We just experienced this in Production yesterday, in 2 of 6 servers in our data center using KPL 0.12.3 with STSAssumeRoleSessionCredentialsProvider. They both started at about the same time. Unfortunately we hadn't noticed until 16 hours after it started :-( and by then the KPL had ballooned from 15M to 3.5G. +1 for addressing this issue!

@JohntaviousB
Copy link

JohntaviousB commented Nov 4, 2017

Facing a similar issue with InstanceProfileCredentials

@fraajad
Copy link

fraajad commented Jan 31, 2018

Also experiencing with InstanceProfileCredentials

pfifer added a commit to pfifer/amazon-kinesis-producer that referenced this issue Apr 10, 2018
Runtime exceptions while updating credentials will no longer kill the
credential update thread.

Interrupted exceptions can be thrown while a thread pool/executor is
being shutdown.  This now handles that case without a message.

Mitigates/Fixes awslabs#49
@pfifer pfifer added this to the v0.12.9 milestone Apr 10, 2018
sahilpalvia pushed a commit that referenced this issue Apr 10, 2018
Runtime exceptions while updating credentials will no longer kill the
credential update thread.

Interrupted exceptions can be thrown while a thread pool/executor is
being shutdown.  This now handles that case without a message.

Mitigates/Fixes #49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants