Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding get_online_features_async to feature store sdk #4172

Merged
merged 4 commits into from
May 9, 2024
Merged

feat: Adding get_online_features_async to feature store sdk #4172

merged 4 commits into from
May 9, 2024

Conversation

breno-costa
Copy link
Contributor

@breno-costa breno-costa commented May 2, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes:

Closes #3927

@breno-costa breno-costa requested review from DvirDukhan and a team as code owners May 2, 2024 19:47
@breno-costa breno-costa requested review from achals and removed request for a team May 2, 2024 19:47
@tokoko
Copy link
Collaborator

tokoko commented May 2, 2024

@breno-costa thanks, I'll take a closer look tomorrow. One comment right now is that we will probably need to add a test for async under online_store in integration tests as well. Tests will have to marked to be skipped for every online store other than redis for now. We can probably defer it for separate PR (?) if you like but it'll be essential eventually.

@breno-costa
Copy link
Contributor Author

breno-costa commented May 3, 2024

I'm running some benchmarks to check latency and throughput for this new implementation.

Just created a benchmark repo with a FastAPI application with sync and async endpoints.
https://github.com/breno-costa/feast-benchmark-async-sdk

Here are some initial results. Don't take them as final results, as I'm still analyzing them. If you have any comments on benchmark repository, please let me know.

Sync implementation

~ wrk --threads 4 --duration 1m --latency "http://localhost:8000/get_online_features"
Running 1m test @ http://localhost:8000/get_online_features
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    30.91ms    7.87ms 176.23ms   80.32%
    Req/Sec    65.00      9.38    90.00     78.16%
  Latency Distribution
     50%   30.49ms
     75%   35.95ms
     90%   39.54ms
     99%   47.80ms
  15572 requests in 1.00m, 31.51MB read
Requests/sec:    259.22
Transfer/sec:    537.18KB

Async implementation

~ wrk --threads 4 --duration 1m --latency "http://localhost:8000/get_online_features_async"
Running 1m test @ http://localhost:8000/get_online_features_async
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    32.30ms    2.95ms  66.73ms   81.85%
    Req/Sec    62.03      5.87    80.00     69.22%
  Latency Distribution
     50%   32.10ms
     75%   33.14ms
     90%   35.50ms
     99%   40.79ms
  14878 requests in 1.00m, 30.11MB read
Requests/sec:    247.59
Transfer/sec:    513.07KB

@tokoko
Copy link
Collaborator

tokoko commented May 3, 2024

that's really useful, I had something like that at the back of my mind. Can I suggest to change fastapi endpoint for sync get_online_features from async to sync? Doing a sync processing behind an async endpoint comes with a performance penalty.

@breno-costa
Copy link
Contributor Author

that's really useful, I had something like that at the back of my mind. Can I suggest to change fastapi endpoint for sync get_online_features from async to sync? Doing a sync processing behind an async endpoint comes with a performance penalty.

Thanks for the heads up. I changed and made a lot of difference. The event loop added some overhead and degraded the performance for sync endpoint. I replaced old values with new ones in last comment.

That said, I didn't see many difference for sync and async implementations for this basic benchmark case. I'll run same script with more features and entities to increase time taken to fetch data from Redis. Maybe add some CPU-bound operation after fetching data to simulate more ML like scenario.

@tokoko
Copy link
Collaborator

tokoko commented May 3, 2024

That's strange 😄 cpu-bound operations will degrade async performance even more, actually. And it's not that realistic of a scenario anyway, one would normally put this async retrieval behind a separate feature server where model inference doesn't take place. one caveat here is that redis in your benchmarks is a local one, so network latency doesn't play as much role. With a remote redis, async will do better, not sure how significantly that would change the picture though.

P.S. another idea that might help... can you also benchmark pure OnlineStore.read(..) calls rather than get_online_features? There's a lot of crazy stuff going on in get_online_features, maybe they are the culprit.

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice lgtm

@breno-costa
Copy link
Contributor Author

@tokoko I've materialized data into a Redis instance on Google Cloud to test this implementation. The server is accessing the Redis instance through a ssh tunnel intentionally to increase network latency. Also increased the number of connections (or users) in my benchmark. When I simulated such kind of scenario, I started to see the benefits of async implementation. Example here:

~ wrk --threads 4 --connections 200 --duration 1m --latency "http://localhost:8000/get_online_features"
Running 1m test @ http://localhost:8000/get_online_features
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.83s   981.20ms   4.65s    71.63%
    Req/Sec    42.35     35.19   160.00     55.26%
  Latency Distribution
     50%    2.32s
     75%    3.55s
     90%    4.31s
     99%    4.49s
  4160 requests in 1.00m, 8.41MB read
  Socket errors: connect 0, read 149, write 0, timeout 0
Requests/sec:     69.23
Transfer/sec:    143.32KB
~ wrk --threads 4 --connections 200 --duration 1m --latency "http://localhost:8000/get_online_features_async"
Running 1m test @ http://localhost:8000/get_online_features_async
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.29s    66.51ms   1.56s    72.30%
    Req/Sec    42.91     24.80   151.00     69.87%
  Latency Distribution
     50%    1.28s
     75%    1.32s
     90%    1.38s
     99%    1.51s
  9206 requests in 1.00m, 18.61MB read
  Socket errors: connect 0, read 142, write 0, timeout 0
Requests/sec:    153.21
Transfer/sec:    317.18KB

@breno-costa
Copy link
Contributor Author

Tests will have to marked to be skipped for every online store other than redis for now. We can probably defer it for separate PR (?) if you like but it'll be essential eventually.

I can manage that in another PR if you don't mind.

Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, lgtm. thanks

@breno-costa
Copy link
Contributor Author

Could you add that label to run tests? I can't add it.

@tokoko
Copy link
Collaborator

tokoko commented May 6, 2024

pretty sure integration tests are still broken, so no point yet :) (#4171). @jeremyary Can you add the label here once credentials are in order?

@jeremyary
Copy link
Collaborator

@tokoko yep, will do. Credentials PR is green now, so hopefully shouldn't be much longer!

@tokoko
Copy link
Collaborator

tokoko commented May 7, 2024

@breno-costa looks like you'll have to rebase on master to get Jeremy's latest PR

@franciscojavierarceo
Copy link
Member

@tokoko actually should just be able to rerun jobs. I just kicked it off.

@jeremyary
Copy link
Collaborator

looked quickly so could be wrong, but assertion errors aren't consistent with other action/PR runs, so may be a legit issue to check out here

Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
@breno-costa
Copy link
Contributor Author

I could finally reproduce the error locally.
There is a difference between sync and async results for timestamp values. I'm investigating that right now.

Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
@breno-costa
Copy link
Contributor Author

Everything is sorted now. All green checks 😄
How do you want to proceed with this PR? @tokoko @franciscojavierarceo

Copy link
Collaborator

@tokoko tokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. we can merge this

@franciscojavierarceo franciscojavierarceo merged commit 311efc5 into feast-dev:master May 9, 2024
17 checks passed
@franciscojavierarceo
Copy link
Member

Awesome stuff, thanks @breno-costa!!!

franciscojavierarceo pushed a commit that referenced this pull request May 24, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
franciscojavierarceo pushed a commit that referenced this pull request May 27, 2024
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24)

### Bug Fixes

* Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6))
* Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff))
* Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084)
* Changed the code the way mysql container is initialized.  ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126)
* Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594))
* Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669)
* Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde))
* Get rid of empty string `name_alias` during feature view projection deserialization  ([#4116](#4116)) ([65056ce](65056ce))
* Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164))
* Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0))
* Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae))
* Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7))
* Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7))
* Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb))
* Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7))
* Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4))
* Update doc ([#4153](#4153)) ([e873636](e873636))
* Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3))
* Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96))
* Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11))

### Features

* Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc))
* Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0))
* Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98))
* Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419))
* Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48))
* Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745))
* Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5))
* Adding support for dictionary writes to online store  ([#4156](#4156)) ([abfac01](abfac01))
* Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640))
* Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef))
* Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562))
* Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621))
* Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9))
* Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4))
* Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150))
* Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f))
* Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea))
* Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716))
* Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d))
* Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114)

### Reverts

* Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add async variant of get_online_features
4 participants