Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

ServiceBusQueue.RenewLocks() fails spontaneously #246

Closed
orzel7 opened this issue Aug 18, 2021 · 11 comments
Closed

ServiceBusQueue.RenewLocks() fails spontaneously #246

orzel7 opened this issue Aug 18, 2021 · 11 comments

Comments

@orzel7
Copy link

orzel7 commented Aug 18, 2021

ServiceBusQueue.RenewLocks() sometimes fails with error:

*Error{Condition: amqp:connection:forced, Description: The connection was inactive for more than the allowed 300000 milliseconds and is closed by container 'LinkTracker'. TrackingId:2c567f5508764c7fbb0f2be74ca7e526_G34, SystemTracker:gateway7, Timestamp:2021-08-18T12:32:45, Info: map[]}

Message Lock duration is set to 1 min
RenewLocks is called with 20 sec. interval
Error occurred after first call to RenewLocks(), i.e after 20 seconds after message has been received.

@orzel7
Copy link
Author

orzel7 commented Aug 18, 2021

It looks RenewLocks() fails always after about 1 hour from creating queue. At the same time, it is still possible to receive messages from queue without errors.

@serbrech
Copy link
Member

Thanks for filing an issue.
A bug fix was merged very recently to fix a bug that would stop to refresh the claim for the RPC client when getting an error. this is used for renewlocks.
If you are able to reliably reproduce, could you try with the latest tagged release, and let me know if this still occurs?
Thanks!

@orzel7
Copy link
Author

orzel7 commented Aug 21, 2021

Did you mean #245? The first thing i did after discovering the problem is update to version 0.10.16, but bug is still there. Currently i'am using REST API call if RenewLocks fails to workaround problem.

@serbrech
Copy link
Member

@orzel7 Hi, we investigated further the Renewlocks code and we'll be pushing some changes that will probably fix the issues you are seeing.
I've been experiencing it sporadically as well.
/cc @richardpark-msft

@serbrech
Copy link
Member

serbrech commented Sep 8, 2021

@orzel7 v0.11.1 includes the RPC link changes that should improve (hopefuly remove) the RenewLocks issues that you were experiencing

@orzel7
Copy link
Author

orzel7 commented Sep 8, 2021

Problem still exists on version 11.1

@serbrech
Copy link
Member

serbrech commented Sep 9, 2021

it's hard to tell what's going on if we can't repro.
the link used to call renewLock is separate from the link created to receive messages, how often do you renew the locks?
it could be that the RPC link that call renewlocks is closed due to inactivity and that we don't reopen it?
if that's the case, we should be able to make it fail after just 300s, as the error message mentions.

we need more info to know how this happens and narrow down the case

@orzel7
Copy link
Author

orzel7 commented Sep 10, 2021

Ok, maybe this is another problem, duplication of #249
RenewLocks fails after 2h5min with error:
unhandled error link 4fbaa5b1-deeb-4e9b-99b1-83d263e22364: status code 401 and description: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://......./$management'. TrackingId:15d8b442-221c-4c79-a33c-7d66e737628e_G16, SystemTracker:NoSystemTracker, Timestamp:2021-09-10T08:36:36
It looks, auth token stops refreshing after some period of time.

@orzel7
Copy link
Author

orzel7 commented Sep 10, 2021

I've found the way, how to reproduce this problem quicker. Just change this function(file sas.go from azure-amqp-common.go)

// GetToken gets a CBS SAS token
func (t *TokenProvider) GetToken(audience string) (*auth.Token, error) {
	signature, expiry := t.signer.SignWithDuration(audience, 2*time.Hour)
	return auth.NewToken(auth.CBSTokenTypeSAS, signature, expiry), nil
}

to

func (t *TokenProvider) GetToken(audience string) (*auth.Token, error) {
	signature, expiry := t.signer.SignWithDuration(audience, 20*time.Minute)
	return auth.NewToken(auth.CBSTokenTypeSAS, signature, expiry), nil
}

After this change, RenewLocks will fail after about 25 minutes. It is obvious for me that token for RPC channel is not refreshed properly.

@orzel7
Copy link
Author

orzel7 commented Sep 10, 2021

It comes out of my observations:

  1. Token refreshing mechanism works properly(every 15 min)
  2. New token generated properly
  3. Newly generated token does not take effect . Probably, Service Bus caches tokens until expiration time. Maybe link/connection should be recreated in order to reset previous valid token.

@richardpark-msft
Copy link
Member

Hi @orzel7,

We've moved development of this package to the azure-sdk-for-go repo link.

I've run this test against the new Service Bus library and can't replicate it but we've done some changes in the internals that have probably helped in this regard.

  1. We no longer use separate connections for each link. This prevents us from idling out since the activity is concentrated around a single connection, rather than several.
  2. The negotiateClaim code, responsible for token renewal, is a bit more robust than before and can handle connection recovery.

I'm adding a permanent test into the stress test suite as a result of your bug. You've been a very helpful contributor and I really appreciate your deep dives when you're troubleshooting bugs. I do hope you'll continue to do so for the new package as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants