-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Retry using exponential backoff upon connection failure with GGD system tests #1892
Conversation
@@ -166,6 +206,15 @@ TEST( GGD_System, GetGGCIPandCertificate ) | |||
cBuffer, /*lint !e971 can use char without signed/unsigned. */ | |||
ulBufferSize, | |||
&xHostAddressData ); | |||
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xStatus -
? Should it be =
? Also, the previous call to GGD_GetGGCIPandCertificate
was not deleted so this will run twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have had too much caffeine.
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT, | ||
clientcredentialGREENGRASS_DISCOVERY_PORT, | ||
clientcredentialIOT_THING_NAME, | ||
cBuffer, /*lint !e971 can use char without signed/unsigned. */ | ||
ulBufferSize, | ||
&xHostAddressData ), | ||
pdPASS, | ||
IOT_TEST_GGD_INITIAL_CONNECTION_RETRY_DELAY, | ||
IOT_TEST_GGD_CONNECT_RETRY_COUNT ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a for
loop that runs this function 10 times to see if it succeeds every time. If we're adding a retry to deal with network flakiness, then I don't think it still makes sense to require this function to succeed 10 times in a row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you suggest removing the retry or removing the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant removing the loop would be best here. I'm not sure why we want to have 10 consecutive successes here, and since this line is usually where this test fails, adding a retry would probably help. I think we should remove the loop and add a retry if everyone else is fine with it.
( int ) xStatus, ( int ) i ); | ||
TEST_ASSERT_EQUAL_INT32_MESSAGE( pdPASS, xStatus, cBuffer ); | ||
} | ||
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=
again
snprintf( cMsgBuffer, nBufferLength, | ||
"GGD_GetGGCIPandCertificate returned %d on iteration %d", | ||
( int ) xStatus, ( int ) i ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to make sense anymore since it's not in a loop. I'm not sure what the purpose of cMsgBuffer
was before this either since it doesn't seem to be used later. Is i
even used anymore in this function?
* @brief The amount of times to retry a request if it fails. | ||
*/ | ||
#ifndef IOT_TEST_GGD_CONNECT_RETRY_COUNT | ||
#define IOT_TEST_GGD_CONNECT_RETRY_COUNT ( 1 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is 1, does that mean there is no retry, but rather a single attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 means a single attempt. I shall comment that out. Thank you!
#endif | ||
|
||
/** | ||
* @brief The amount of times to retry a connection if it fails with 1 being a single attempt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording still seems a bit off to me. How would you feel about something like:
The number of times to try a connection. Values greater than 1 will retry on failure with an exponential backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
Description
GGD_JSONRequestStart method creates a connection to a specified host and port by using the Sockets API. If that connection fails, then the test completely fails. This PR addresses this issue by wrapping the method around RETRY_EXPONENTIAL, which retries the method with exponential backoff if the expected status is not returned.
Also considered retries for the library function instead but this masks the problem that really needs to be handled in the network layer.
Checklist:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.