feat: Cassandra online store, concurrent fetching for multiple entities #3356

hemidactylus · 2022-11-21T14:49:49Z

This changes the retrieval of features from the Cassandra online store by leveraging the
Cassandra driver's native concurrency capabilities.

When there are several entities to be retrieved, instead of a sequential read one-by-one, entity after entity,
the reads are executed concurrently, with the driver ensuring the results are kept in the correct order and the call
returns when all results are available.
This, as measured in realistic environments, implies a speedup of 2-3x for retrieval of 20 to 100 entities at once.

Using the Cassandra driver's execute_concurrent_with_args function requires a new parameter controlling the maximum amount of concurrency to use (somewhat bounded by the number of vCPUs at hand): for transparency, this is exposed in the feature store configuration yaml as a new parameter, which is documented and correctly handled by the guided procedure of feast init -t cassandra.

minimal handling of exceptions in concurrent query execution read_concurrency parameter in Cassandra online store config yaml Signed-off-by: Stefano Lottini <stefano.lottini@datastax.com>

hemidactylus · 2022-11-29T09:42:46Z

/lgtm

feast-ci-bot · 2022-11-29T09:42:48Z

@hemidactylus: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adchia

/lgtm

feast-ci-bot · 2022-11-29T14:04:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, hemidactylus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adchia,hemidactylus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

# [0.27.0](v0.26.0...v0.27.0) (2022-12-05) ### Bug Fixes * Changing Snowflake template code to avoid query not implemented … ([#3319](#3319)) ([1590d6b](1590d6b)) * Dask zero division error if parquet dataset has only one partition ([#3236](#3236)) ([69e4a7d](69e4a7d)) * Enable Spark materialization on Yarn ([#3370](#3370)) ([0c20a4e](0c20a4e)) * Ensure that Snowflake accounts for number columns that overspecify precision ([#3306](#3306)) ([0ad0ace](0ad0ace)) * Fix memory leak from usage.py not properly cleaning up call stack ([#3371](#3371)) ([a0c6fde](a0c6fde)) * Fix workflow to contain env vars ([#3379](#3379)) ([548bed9](548bed9)) * Update bytewax materialization ([#3368](#3368)) ([4ebe00f](4ebe00f)) * Update the version counts ([#3378](#3378)) ([8112db5](8112db5)) * Updated AWS Athena template ([#3322](#3322)) ([5956981](5956981)) * Wrong UI data source type display ([#3276](#3276)) ([8f28062](8f28062)) ### Features * Cassandra online store, concurrency in bulk write operations ([#3367](#3367)) ([eaf354c](eaf354c)) * Cassandra online store, concurrent fetching for multiple entities ([#3356](#3356)) ([00fa21f](00fa21f)) * Get Snowflake Query Output As Pyspark Dataframe ([#2504](#2504)) ([#3358](#3358)) ([2f18957](2f18957))

feast-ci-bot added approved size/L labels Nov 21, 2022

concurrent fetching for multiple entities

e9c04f9

minimal handling of exceptions in concurrent query execution read_concurrency parameter in Cassandra online store config yaml Signed-off-by: Stefano Lottini <stefano.lottini@datastax.com>

hemidactylus force-pushed the sl-cassandra-optimize-bulk-reads branch from ce4a0eb to e9c04f9 Compare November 21, 2022 14:50

hemidactylus requested a review from achals November 22, 2022 10:39

adchia approved these changes Nov 29, 2022

View reviewed changes

feast-ci-bot assigned adchia Nov 29, 2022

feast-ci-bot added the lgtm label Nov 29, 2022

feast-ci-bot merged commit 00fa21f into feast-dev:master Nov 29, 2022

hemidactylus deleted the sl-cassandra-optimize-bulk-reads branch November 29, 2022 15:17

hemidactylus mentioned this pull request Nov 30, 2022

feat: Cassandra online store, concurrency in bulk write operations #3367

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Cassandra online store, concurrent fetching for multiple entities #3356

feat: Cassandra online store, concurrent fetching for multiple entities #3356

hemidactylus commented Nov 21, 2022

hemidactylus commented Nov 29, 2022

feast-ci-bot commented Nov 29, 2022

adchia left a comment

feast-ci-bot commented Nov 29, 2022

feat: Cassandra online store, concurrent fetching for multiple entities #3356

feat: Cassandra online store, concurrent fetching for multiple entities #3356

Conversation

hemidactylus commented Nov 21, 2022

hemidactylus commented Nov 29, 2022

feast-ci-bot commented Nov 29, 2022

adchia left a comment

Choose a reason for hiding this comment

feast-ci-bot commented Nov 29, 2022