Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Retry using exponential backoff upon connection failure with GGD system tests #1892

Merged
merged 14 commits into from
Apr 23, 2020

Conversation

yourslab
Copy link
Contributor

Description

GGD_JSONRequestStart method creates a connection to a specified host and port by using the Sockets API. If that connection fails, then the test completely fails. This PR addresses this issue by wrapping the method around RETRY_EXPONENTIAL, which retries the method with exponential backoff if the expected status is not returned.

Also considered retries for the library function instead but this masks the problem that really needs to be handled in the network layer.

Checklist:

  • I have tested my changes. No regression in existing tests.
  • My code is Linted.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yourslab yourslab changed the title Retry using exponential backoff upon connection failure with GGD Retry using exponential backoff upon connection failure with GGD systen tests Apr 10, 2020
@@ -166,6 +206,15 @@ TEST( GGD_System, GetGGCIPandCertificate )
cBuffer, /*lint !e971 can use char without signed/unsigned. */
ulBufferSize,
&xHostAddressData );
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xStatus - ? Should it be =? Also, the previous call to GGD_GetGGCIPandCertificate was not deleted so this will run twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must have had too much caffeine.

Comment on lines 209 to 217
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT,
clientcredentialGREENGRASS_DISCOVERY_PORT,
clientcredentialIOT_THING_NAME,
cBuffer, /*lint !e971 can use char without signed/unsigned. */
ulBufferSize,
&xHostAddressData ),
pdPASS,
IOT_TEST_GGD_INITIAL_CONNECTION_RETRY_DELAY,
IOT_TEST_GGD_CONNECT_RETRY_COUNT );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a for loop that runs this function 10 times to see if it succeeds every time. If we're adding a retry to deal with network flakiness, then I don't think it still makes sense to require this function to succeed 10 times in a row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you suggest removing the retry or removing the loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant removing the loop would be best here. I'm not sure why we want to have 10 consecutive successes here, and since this line is usually where this test fails, adding a retry would probably help. I think we should remove the loop and add a retry if everyone else is fine with it.

@yourslab yourslab changed the title Retry using exponential backoff upon connection failure with GGD systen tests Retry using exponential backoff upon connection failure with GGD system tests Apr 10, 2020
( int ) xStatus, ( int ) i );
TEST_ASSERT_EQUAL_INT32_MESSAGE( pdPASS, xStatus, cBuffer );
}
RETRY_EXPONENTIAL( xStatus - GGD_GetGGCIPandCertificate( clientcredentialMQTT_BROKER_ENDPOINT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= again

Comment on lines 210 to 212
snprintf( cMsgBuffer, nBufferLength,
"GGD_GetGGCIPandCertificate returned %d on iteration %d",
( int ) xStatus, ( int ) i );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem to make sense anymore since it's not in a loop. I'm not sure what the purpose of cMsgBuffer was before this either since it doesn't seem to be used later. Is i even used anymore in this function?

muneebahmed10
muneebahmed10 previously approved these changes Apr 10, 2020
* @brief The amount of times to retry a request if it fails.
*/
#ifndef IOT_TEST_GGD_CONNECT_RETRY_COUNT
#define IOT_TEST_GGD_CONNECT_RETRY_COUNT ( 1 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is 1, does that mean there is no retry, but rather a single attempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 means a single attempt. I shall comment that out. Thank you!

#endif

/**
* @brief The amount of times to retry a connection if it fails with 1 being a single attempt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording still seems a bit off to me. How would you feel about something like:

The number of times to try a connection. Values greater than 1 will retry on failure with an exponential backoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@gkwicker gkwicker merged commit ee497c1 into master Apr 23, 2020
@yourslab yourslab deleted the ggd/retry-fixes branch April 24, 2020 03:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants