Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallelization in Redis Sink #866

Merged
merged 4 commits into from
Jul 13, 2020

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented Jul 9, 2020

What this PR does / why we need it:

In redis sink we pack rows into batches before actually flush them to redis. Current implementation use GroupBy null (all rows into one group) which cause a lot of shuffling. It could lead to significant throughput degradation. This PR replaces GroupBy with GroupIntoBatches with FeatureReference used as key. This should improve parallelization of Redis Sink

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Users can now configure the flush frequency to write to a Redis store.

@feast-ci-bot feast-ci-bot added size/L and removed size/M labels Jul 9, 2020
@pyalex pyalex force-pushed the fix-redis-batch branch 2 times, most recently from 2a12816 to 7a46f7f Compare July 9, 2020 10:58
@pyalex pyalex changed the base branch from v0.6-branch to master July 13, 2020 05:19
@pyalex
Copy link
Collaborator Author

pyalex commented Jul 13, 2020

/test test-end-to-end-auth

@pyalex pyalex added kind/feature New feature or request and removed kind/housekeeping labels Jul 13, 2020
@pyalex
Copy link
Collaborator Author

pyalex commented Jul 13, 2020

/test test-end-to-end-auth

@woop
Copy link
Member

woop commented Jul 13, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit 25ff687 into feast-dev:master Jul 13, 2020
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pyalex pushed a commit to pyalex/feast that referenced this pull request Jul 17, 2020
* replace group with batch

* featureReference as key

* configurable flush frequncy in redis sink

* pull config to RedisSink
pyalex pushed a commit that referenced this pull request Jul 17, 2020
* replace group with batch

* featureReference as key

* configurable flush frequncy in redis sink

* pull config to RedisSink
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants