-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding get_online_features_async to feature store sdk #4172
Conversation
@breno-costa thanks, I'll take a closer look tomorrow. One comment right now is that we will probably need to add a test for async under online_store in integration tests as well. Tests will have to marked to be skipped for every online store other than redis for now. We can probably defer it for separate PR (?) if you like but it'll be essential eventually. |
I'm running some benchmarks to check latency and throughput for this new implementation. Just created a benchmark repo with a FastAPI application with sync and async endpoints. Here are some initial results. Don't take them as final results, as I'm still analyzing them. If you have any comments on benchmark repository, please let me know. Sync implementation ~ wrk --threads 4 --duration 1m --latency "http://localhost:8000/get_online_features"
Running 1m test @ http://localhost:8000/get_online_features
4 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 30.91ms 7.87ms 176.23ms 80.32%
Req/Sec 65.00 9.38 90.00 78.16%
Latency Distribution
50% 30.49ms
75% 35.95ms
90% 39.54ms
99% 47.80ms
15572 requests in 1.00m, 31.51MB read
Requests/sec: 259.22
Transfer/sec: 537.18KB Async implementation ~ wrk --threads 4 --duration 1m --latency "http://localhost:8000/get_online_features_async"
Running 1m test @ http://localhost:8000/get_online_features_async
4 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 32.30ms 2.95ms 66.73ms 81.85%
Req/Sec 62.03 5.87 80.00 69.22%
Latency Distribution
50% 32.10ms
75% 33.14ms
90% 35.50ms
99% 40.79ms
14878 requests in 1.00m, 30.11MB read
Requests/sec: 247.59
Transfer/sec: 513.07KB |
that's really useful, I had something like that at the back of my mind. Can I suggest to change fastapi endpoint for sync |
Thanks for the heads up. I changed and made a lot of difference. The event loop added some overhead and degraded the performance for sync endpoint. I replaced old values with new ones in last comment. That said, I didn't see many difference for sync and async implementations for this basic benchmark case. I'll run same script with more features and entities to increase time taken to fetch data from Redis. Maybe add some CPU-bound operation after fetching data to simulate more ML like scenario. |
That's strange 😄 cpu-bound operations will degrade async performance even more, actually. And it's not that realistic of a scenario anyway, one would normally put this async retrieval behind a separate feature server where model inference doesn't take place. one caveat here is that redis in your benchmarks is a local one, so network latency doesn't play as much role. With a remote redis, async will do better, not sure how significantly that would change the picture though. P.S. another idea that might help... can you also benchmark pure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice lgtm
@tokoko I've materialized data into a Redis instance on Google Cloud to test this implementation. The server is accessing the Redis instance through a ssh tunnel intentionally to increase network latency. Also increased the number of connections (or users) in my benchmark. When I simulated such kind of scenario, I started to see the benefits of async implementation. Example here: ~ wrk --threads 4 --connections 200 --duration 1m --latency "http://localhost:8000/get_online_features"
Running 1m test @ http://localhost:8000/get_online_features
4 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.83s 981.20ms 4.65s 71.63%
Req/Sec 42.35 35.19 160.00 55.26%
Latency Distribution
50% 2.32s
75% 3.55s
90% 4.31s
99% 4.49s
4160 requests in 1.00m, 8.41MB read
Socket errors: connect 0, read 149, write 0, timeout 0
Requests/sec: 69.23
Transfer/sec: 143.32KB ~ wrk --threads 4 --connections 200 --duration 1m --latency "http://localhost:8000/get_online_features_async"
Running 1m test @ http://localhost:8000/get_online_features_async
4 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.29s 66.51ms 1.56s 72.30%
Req/Sec 42.91 24.80 151.00 69.87%
Latency Distribution
50% 1.28s
75% 1.32s
90% 1.38s
99% 1.51s
9206 requests in 1.00m, 18.61MB read
Socket errors: connect 0, read 142, write 0, timeout 0
Requests/sec: 153.21
Transfer/sec: 317.18KB |
I can manage that in another PR if you don't mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, lgtm. thanks
Could you add that label to run tests? I can't add it. |
pretty sure integration tests are still broken, so no point yet :) (#4171). @jeremyary Can you add the label here once credentials are in order? |
@tokoko yep, will do. Credentials PR is green now, so hopefully shouldn't be much longer! |
@breno-costa looks like you'll have to rebase on master to get Jeremy's latest PR |
@tokoko actually should just be able to rerun jobs. I just kicked it off. |
looked quickly so could be wrong, but assertion errors aren't consistent with other action/PR runs, so may be a legit issue to check out here |
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
I could finally reproduce the error locally. |
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
Signed-off-by: Breno Costa <brenocosta0901@gmail.com>
Everything is sorted now. All green checks 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. we can merge this
Awesome stuff, thanks @breno-costa!!! |
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
# [0.38.0](v0.37.0...v0.38.0) (2024-05-24) ### Bug Fixes * Add vector database doc ([#4165](#4165)) ([37f36b6](37f36b6)) * Change checkout action back to v3 from v5 which isn't released yet ([#4147](#4147)) ([9523fff](9523fff)) * Change numpy version <1.25 dependency to <2 in setup.py ([#4085](#4085)) ([2ba71ff](2ba71ff)), closes [#4084](#4084) * Changed the code the way mysql container is initialized. ([#4140](#4140)) ([8b5698f](8b5698f)), closes [#4126](#4126) * Correct nightly install command, move all installs to uv ([#4164](#4164)) ([c86d594](c86d594)) * Default value is not set in Redis connection string using environment variable ([#4136](#4136)) ([95acfb4](95acfb4)), closes [#3669](#3669) * Get container host addresses from testcontainers (java) ([#4125](#4125)) ([9184dde](9184dde)) * Get rid of empty string `name_alias` during feature view projection deserialization ([#4116](#4116)) ([65056ce](65056ce)) * Helm chart `feast-feature-server`, improve Service template name ([#4161](#4161)) ([dedc164](dedc164)) * Improve the code related to on-demand-featureview. ([#4203](#4203)) ([d91d7e0](d91d7e0)) * Integration tests for async sdk method ([#4201](#4201)) ([08c44ae](08c44ae)) * Make sure schema is used when calling `get_table_query_string` method for Snowflake datasource ([#4131](#4131)) ([c1579c7](c1579c7)) * Make sure schema is used when generating `from_expression` for Snowflake ([#4177](#4177)) ([5051da7](5051da7)) * Pass native input values to `get_online_features` from feature server ([#4117](#4117)) ([60756cb](60756cb)) * Pass region to S3 client only if set (Java) ([#4151](#4151)) ([b8087f7](b8087f7)) * Pgvector patch ([#4108](#4108)) ([ad45bb4](ad45bb4)) * Update doc ([#4153](#4153)) ([e873636](e873636)) * Update master-only benchmark bucket name due to credential update ([#4183](#4183)) ([e88f1e3](e88f1e3)) * Updating the instructions for quickstart guide. ([#4120](#4120)) ([0c30e96](0c30e96)) * Upgrading the test container so that local tests works with updated d… ([#4155](#4155)) ([93ddb11](93ddb11)) ### Features * Add a Kubernetes Operator for the Feast Feature Server ([#4145](#4145)) ([4a696dc](4a696dc)) * Add delta format to `FileSource`, add support for it in ibis/duckdb ([#4123](#4123)) ([2b6f1d0](2b6f1d0)) * Add materialization support to ibis/duckdb ([#4173](#4173)) ([369ca98](369ca98)) * Add optional private key params to Snowflake config ([#4205](#4205)) ([20f5419](20f5419)) * Add s3 remote storage export for duckdb ([#4195](#4195)) ([6a04c48](6a04c48)) * Adding DatastoreOnlineStore 'database' argument. ([#4180](#4180)) ([e739745](e739745)) * Adding get_online_features_async to feature store sdk ([#4172](#4172)) ([311efc5](311efc5)) * Adding support for dictionary writes to online store ([#4156](#4156)) ([abfac01](abfac01)) * Elasticsearch vector database ([#4188](#4188)) ([bf99640](bf99640)) * Enable other distance metrics for Vector DB and Update docs ([#4170](#4170)) ([ba9f4ef](ba9f4ef)) * Feast/IKV datetime edgecase errors ([#4211](#4211)) ([bdae562](bdae562)) * Feast/IKV documenation language changes ([#4149](#4149)) ([690a621](690a621)) * Feast/IKV online store contrib plugin integration ([#4068](#4068)) ([f2b4eb9](f2b4eb9)) * Feast/IKV online store documentation ([#4146](#4146)) ([73601e4](73601e4)) * Feast/IKV upgrade client version ([#4200](#4200)) ([0e42150](0e42150)) * Incorporate substrait ODFVs into ibis-based offline store queries ([#4102](#4102)) ([c3a102f](c3a102f)) * Isolate input-dependent calculations in `get_online_features` ([#4041](#4041)) ([2a6edea](2a6edea)) * Make arrow primary interchange for online ODFV execution ([#4143](#4143)) ([3fdb716](3fdb716)) * Move data source validation entrypoint to offline store ([#4197](#4197)) ([a17725d](a17725d)) * Upgrading python version to 3.11, adding support for 3.11 as well. ([#4159](#4159)) ([4b1634f](4b1634f)), closes [#4152](#4152) [#4114](#4114) ### Reverts * Reverts "fix: Using version args to install the correct feast version" ([#4112](#4112)) ([b66baa4](b66baa4)), closes [#3953](#3953)
What this PR does / why we need it:
Which issue(s) this PR fixes:
Closes #3927