ExpiredTokenException #49

boivie-at-sony · 2016-04-13T17:33:49Z

I have setup the KPL to post to Kinesis using an assumed role by using STSAssumeRoleSessionCredentialsProvider as credentials provider.

Now, this worked well for some time, but I ended up after a few hours with these errors:

17:28:54.774 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=42, attempts:
Delay after prev attempt: 1508 ms, Duration: 4 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
17:28:54.774 [kpl-callback-pool-0-thread-0] ERROR c.s.bui.kinesis.KinesisOutput - Exception while posting to kinesis
com.amazonaws.services.kinesis.producer.UserRecordFailedException: null
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler.onPutRecordResult(KinesisProducer.java:188) [amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler.access$000(KinesisProducer.java:127) [amazon-kinesis-producer-0.10.2.jar:na]
    at com.amazonaws.services.kinesis.producer.KinesisProducer$MessageHandler$1.run(KinesisProducer.java:134) [amazon-kinesis-producer-0.10.2.jar:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72-internal]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_72-internal]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72-internal]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72-internal]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-internal]

Does the Java code renew credentials often enough and hand them over to the native daemon?

The text was updated successfully, but these errors were encountered:

boivie-at-sony · 2016-04-13T17:49:26Z

Hmm... I think this can be attributed to clock drift. Closing it.

boivie-at-sony · 2016-06-02T09:57:08Z

Now it has happened again on a production system, and it was definitely not clock drift:

[2016-06-02 03:48:40.434936] [0x00007fba4c9d9700] [error] [retrier.cc:59] PutRecords failed: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
03:48:40.439 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=42, attempts:
Delay after prev attempt: 1501 ms, Duration: 8 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}
03:48:40.439 [kpl-callback-pool-0-thread-0] WARN  c.s.bui.kinesis.KinesisOutput - Record failed to put, partitionKey=music-prod, attempts:
Delay after prev attempt: 1501 ms, Duration: 8 ms, Code: 400, Message: {"__type":"ExpiredTokenException","message":"The security token included in the request is expired"}

It doesn't recover even after a few hours.

As said earlier, we're using the STSAssumeRoleSessionCredentialsProvider to get temporary and time limited credentials, and I know IAM is very picky about refreshing them before they expire.

Is this really supported well enough?

alexmnyc · 2016-11-14T16:09:24Z

bump. happens in 0.12.1 with DefaultCredentialsProvider

jeremysears · 2017-01-17T17:43:44Z

I also see this in 0.12.1 with the DefaultCredentialsProvider, while including the STS libs in the classpath.

pfifer · 2017-01-17T23:02:07Z

Thanks for reporting this. It appears that when the credentials are being refreshed those aren't making into the client.

@jeremysears do you mean the DefaultAWSCredentialsProviderChain?

My suspicion is that the code that handles sending the credentials to the native component isn't seeing the credential change, allowing the credentials to expire. I still need to investigate this some more.

Can everyone who is seeing this tell me what type of credentials are you using:

Instance Profile
STS Credentials
ECS Container Credentials
Other Credentials

jeremysears · 2017-01-18T16:57:44Z

Yes, the DefaultAWSCredentialsProviderChain. I'm not sure if this is related, but I included the STS libs so that I could run some utilities locally w/ the role we use on our servers. However, I see the error on our servers, where we're using EC2 Container Credentials (no STS, but the libs are now available). W/o the STS lib available in the classpath we don't see this issue on our servers, w/ no other code changes.

pfifer · 2017-01-18T17:11:15Z

I think I have an idea of the cause. Both the ECS Credentials, and the STS Credentials have the possibility of throwing an exception if something goes wrong on their remote side. If my suspicion is correct the credential update thread is getting killed when an unhandled exception occurs.

From the reports it sounds like this doesn't happen all the time, but does happen occasionally. I'll see about adding some checks around around the credential retrieval, and providing some type of auto restart should the thread be killed.

jeremysears · 2017-01-30T20:18:11Z

I also see this when the STS libs are not included in the classpath.

pfifer · 2017-02-15T20:29:11Z

@jeremysears If my guess is correct it's not related to the STS credentials, but to any credentials that can throw an exception on retrieval. Regardless of whether this is the cause, strengthening the credential refresh thread would be beneficial.

Could those affected please comment, or add a reaction to assist us in prioritizing the change.

Thanks

jmmirick · 2017-02-25T21:27:21Z

We just experienced this in Production yesterday, in 2 of 6 servers in our data center using KPL 0.12.3 with STSAssumeRoleSessionCredentialsProvider. They both started at about the same time. Unfortunately we hadn't noticed until 16 hours after it started :-( and by then the KPL had ballooned from 15M to 3.5G. +1 for addressing this issue!

JohntaviousB · 2017-11-04T04:47:05Z

Facing a similar issue with InstanceProfileCredentials

fraajad · 2018-01-31T01:09:45Z

Also experiencing with InstanceProfileCredentials

Runtime exceptions while updating credentials will no longer kill the credential update thread. Interrupted exceptions can be thrown while a thread pool/executor is being shutdown. This now handles that case without a message. Mitigates/Fixes awslabs#49

Runtime exceptions while updating credentials will no longer kill the credential update thread. Interrupted exceptions can be thrown while a thread pool/executor is being shutdown. This now handles that case without a message. Mitigates/Fixes #49

boivie-at-sony closed this as completed Apr 13, 2016

boivie-at-sony reopened this Jun 2, 2016

pfifer added the bug label Feb 15, 2017

pfifer mentioned this issue Apr 10, 2018

Handle exceptions while updating credentials better #199

Merged

pfifer added this to the v0.12.9 milestone Apr 10, 2018

sahilpalvia closed this as completed in #199 Apr 10, 2018

crispmark mentioned this issue Jun 29, 2020

updated amazon-kinesis-producer library zendesk/maxwell#1532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExpiredTokenException #49

ExpiredTokenException #49

boivie-at-sony commented Apr 13, 2016

boivie-at-sony commented Apr 13, 2016

boivie-at-sony commented Jun 2, 2016

alexmnyc commented Nov 14, 2016

jeremysears commented Jan 17, 2017

pfifer commented Jan 17, 2017

jeremysears commented Jan 18, 2017

pfifer commented Jan 18, 2017

jeremysears commented Jan 30, 2017

pfifer commented Feb 15, 2017

jmmirick commented Feb 25, 2017

JohntaviousB commented Nov 4, 2017 •

edited

Loading

fraajad commented Jan 31, 2018

ExpiredTokenException #49

ExpiredTokenException #49

Comments

boivie-at-sony commented Apr 13, 2016

boivie-at-sony commented Apr 13, 2016

boivie-at-sony commented Jun 2, 2016

alexmnyc commented Nov 14, 2016

jeremysears commented Jan 17, 2017

pfifer commented Jan 17, 2017

jeremysears commented Jan 18, 2017

pfifer commented Jan 18, 2017

jeremysears commented Jan 30, 2017

pfifer commented Feb 15, 2017

jmmirick commented Feb 25, 2017

JohntaviousB commented Nov 4, 2017 • edited Loading

fraajad commented Jan 31, 2018

JohntaviousB commented Nov 4, 2017 •

edited

Loading