Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle retry for redis io flow #274

Merged
merged 1 commit into from
Jan 5, 2020

Conversation

khorshuheng
Copy link
Collaborator

Currently, the Redis write to the serving store does not handle transient connection failures, which results in data loss. This PR augment the pipeline to retry the connection up to a certain maximum limit. The limit and backoff time are configurable via RedisConfig.

@feast-ci-bot
Copy link
Collaborator

Hi @khorshuheng. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@davidheryanto
Copy link
Collaborator

/ok-to-test

@davidheryanto
Copy link
Collaborator

davidheryanto commented Oct 29, 2019

/test test-golang-sdk
Test cases in golang-sdk unit test need to be updated to ignore actual results that is not in order
Retrying will likely to pass the test for now
Not related to this pull request

@@ -110,6 +110,8 @@ message Store {
message RedisConfig {
string host = 1;
int32 port = 2;
int32 backoff_ms = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest some comments to the new proto fields.

Suggested change
int32 backoff_ms = 3;
// Optional. The number of milliseconds to wait before retrying failed Redis connection.
// By default, Feast uses exponential backoff policy and "backoff_ms" sets the initial wait duration.
int32 backoff_ms = 3;
// Optional. Maximum total number of retries for connecting to Redis. Default to zero retries.
int32 max_retries = 4;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpick, the initial backoff duration could be renamed as follow

int32 initial_backoff_ms = 3;

So that the generated code will be clearer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidheryanto @pradithya Thanks for the suggestion, i have implemented the change as suggested.

@zhilingc
Copy link
Collaborator

Looks great!
Is it possible to output failed rows to write to the deadletter table?

@khorshuheng
Copy link
Collaborator Author

Looks great!
Is it possible to output failed rows to write to the deadletter table?

Thanks! There are a few things which needs to be discussed before the implementation:

  1. Where do we plan to store the dead letters? Should we stored them in Kafka, or Big Query? Should it be configurable?

  2. Apart from the error message, what kind of information would be required for the dead letters? Is it sufficient to pass in the key / value bytes? Something like:

FailedRedisMessage

  • keyBytes
  • valueBytes
  • errMessage?

@zhilingc
Copy link
Collaborator

Looks great!
Is it possible to output failed rows to write to the deadletter table?

Thanks! There are a few things which needs to be discussed before the implementation:

  1. Where do we plan to store the dead letters? Should we stored them in Kafka, or Big Query? Should it be configurable?
  2. Apart from the error message, what kind of information would be required for the dead letters? Is it sufficient to pass in the key / value bytes? Something like:

FailedRedisMessage

  • keyBytes
  • valueBytes
  • errMessage?

We currently write deadletters to BQ using the WriteFailedElementToBigQuery PTransform, we can extend it to other types of sinks (kafka, file, etc) later.

As for the failed element message, just key/value bytes together with the error message and timestamp would be great!

@khorshuheng khorshuheng force-pushed the redis-with-retry branch 2 times, most recently from 1db929d to 51c5ec2 Compare November 5, 2019 01:43
@woop woop changed the base branch from 0.3-dev to master November 17, 2019 10:51
@khorshuheng
Copy link
Collaborator Author

Looks great!
Is it possible to output failed rows to write to the deadletter table?

Thanks! There are a few things which needs to be discussed before the implementation:

  1. Where do we plan to store the dead letters? Should we stored them in Kafka, or Big Query? Should it be configurable?
  2. Apart from the error message, what kind of information would be required for the dead letters? Is it sufficient to pass in the key / value bytes? Something like:

FailedRedisMessage

  • keyBytes
  • valueBytes
  • errMessage?

We currently write deadletters to BQ using the WriteFailedElementToBigQuery PTransform, we can extend it to other types of sinks (kafka, file, etc) later.

As for the failed element message, just key/value bytes together with the error message and timestamp would be great!

I have implemented this. Though, as discussed earlier, the bytes value are currently represented as string, due to the BigQuery schema.

@woop
Copy link
Member

woop commented Nov 28, 2019

/lgtm

@woop
Copy link
Member

woop commented Nov 28, 2019

/approve

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Nov 28, 2019

/retest

@woop
Copy link
Member

woop commented Dec 22, 2019

@khorshuheng Happy to merge this in if we can resolve the conflicts.

@woop
Copy link
Member

woop commented Dec 25, 2019

/retest

@woop
Copy link
Member

woop commented Jan 5, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit 8a0f53b into feast-dev:master Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants