-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch implementation with timer #6452
Batch implementation with timer #6452
Conversation
@HenryYYang for first pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please add an entry to version history and the redis proxy configuration doc.
test/extensions/filters/network/common/redis/client_impl_test.cc
Outdated
Show resolved
Hide resolved
@HenryYYang, yeah, saw version history in your PR when I was reviewing (docs/root/intro/version_history.rst), so I'll add a note there. |
|
||
// Pretend the the flush timer fires | ||
EXPECT_CALL(*flush_timer_, enabled()).WillOnce(Return(false)); | ||
; // timer is disabled, as it has already fired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 this was the python formatter. Should I flag this as an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
I suppose I'll get on the tip of mainline master again- that fixed coverage, but broke asan :| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have some merge issues here. Can you take a look and make sure the diff is what you expect? Also, is this ready to fully review? I thought you mentioned that you want to do some testing first? Thank you!
/wait
@@ -46,6 +46,12 @@ message RedisProxy { | |||
// * '{user1000}.following' and '{user1000}.followers' **will** be sent to the same upstream | |||
// * '{user1000}.following' and '{user1001}.following' **might** be sent to the same upstream | |||
bool enable_hashtagging = 2; | |||
|
|||
// Maximum size of buffer before flush is triggered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you specify what happens if this is unset? Same below?
docs/root/intro/version_history.rst
Outdated
@@ -59,6 +59,9 @@ Version history | |||
* redis: added :ref:`success and error stats <config_network_filters_redis_proxy_per_command_stats>` for commands. | |||
* redis: migrate hash function for host selection to `MurmurHash2 <https://sites.google.com/site/murmurhash>`_ from std::hash. MurmurHash2 is compatible with std::hash in GNU libstdc++ 3.4.20 or above. This is typically the case when compiled on Linux and not macOS. | |||
* redis: added :ref:`latency_in_micros <envoy_api_field_config.filter.network.redis_proxy.v2.RedisProxy.latency_in_micros>` to specify the redis commands stats time unit in microseconds. | |||
* redis: added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge master and move these docs to the 1.11.0 section.
This is not ready for full review- I am attempting to do some performance testing in AWS first. |
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
02bb39d
to
446d3b6
Compare
Ok, had a chance to test in AWS. Results are less promising than hoped. Perhaps my understanding of the TODO: Have to also add in notes on when unset per earlier comment from Matt. TestProcedureUsing an appropriate config file for Envoy (depending on if we are batching or not, ie we add batch params):
Basic Configuration of Toml file
EC2 -> EC2First, we compare performance of batching for different request rates to another EC2 instance running Redis. Test 1
Non-batchrates: 15212.6 rps Batchrates: 17425 rps Test 2
Non-batchrates: 13620 rps Batchrates: 14822 rps EC2 -> ElasticacheTest 1
Non-batchrates: 15294.2 rps Batchrates: 16511.2 rps Test 2
Non-batchrates: 14373.8 rps Batchrates: 14660.8 rps |
@FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg so you want to review this? Or you are still working on it? If so, PTAL at CI. /wait |
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
@mattklein123 sure, let's see if my changes fix the CI issues. Mitch's redirection work came through on merge and broke builds of some tests. However, if time is tight I'd prefer a review on #6446, which has been updated per your comments. Originally I had planned on waiting for Henry and doing more testing, but I think the above results are pretty straightforward, and little perf bump is not unwelcome. |
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good modulo small comments. Can you add an integration test with buffering and flushing enabled?
/wait
api/envoy/config/filter/network/redis_proxy/v2/redis_proxy.proto
Outdated
Show resolved
Hide resolved
api/envoy/config/filter/network/redis_proxy/v2/redis_proxy.proto
Outdated
Show resolved
Hide resolved
test/extensions/filters/network/common/redis/client_impl_test.cc
Outdated
Show resolved
Hide resolved
|
||
// Pretend the the flush timer fires | ||
EXPECT_CALL(*flush_timer_, enabled()).WillOnce(Return(false)); | ||
; // timer is disabled, as it has already fired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
test/extensions/filters/network/common/redis/client_impl_test.cc
Outdated
Show resolved
Hide resolved
re |
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
…lure Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
@mattklein123 ready for review (your comments addressed, integration test added). Note that asan/tsan have failed on unrelated tests- for one build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice looks great, just a few nits. Yeah there are some flakes in master right now. You can always use the /retest
command to try to deflake.
/wait
api/envoy/config/filter/network/redis_proxy/v2/redis_proxy.proto
Outdated
Show resolved
Hide resolved
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
Signed-off-by: Nicolas Flacco <nflacco@lyft.com>
@mattklein123 fixed things according to your comments; all checks now pass on latest master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
* master: (26 commits) docs: update docs to recommend /retest repokitteh command (envoyproxy#6655) http timeout integration test: wait for 15s for upstream reset (envoyproxy#6646) access log: add response code details to the access log formatter (envoyproxy#6626) build: add ppc build badge to README (envoyproxy#6629) Revert dispatcher stats (envoyproxy#6649) Batch implementation with timer (envoyproxy#6452) fault filter: reset token bucket on data start (envoyproxy#6627) event: update libevent dependency to fix race condition (envoyproxy#6637) examples: standardize docker-compose version and yaml extension (envoyproxy#6613) quiche: Implement SpdyUnsafeArena using SpdySimpleArena (envoyproxy#6612) router: support customizable retry back-off intervals (envoyproxy#6568) api: create OpenRCA service proto file (envoyproxy#6497) ext_authz: option for clearing route cache of authorized requests (envoyproxy#6503) build: update jinja to 2.10.1. (envoyproxy#6623) tools: check spelling in pre-push hook (envoyproxy#6631) security: blameless postmortem template. (envoyproxy#6553) Implementing Endpoint lease for ClusterLoadAssigment (envoyproxy#6477) add HTTP integration tests exercising timeouts (envoyproxy#6621) event: fix DispatcherImplTest::InitializeStats flake (envoyproxy#6619) Add tag extractor for RDS route config name (envoyproxy#6618) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Envoy supports Redis pipelined commands, but does not batch commands. By batching, I mean when non-pipelined or pipelined commands are sent through Envoy, Envoy should not send them upstream to Redis until a certain number of commands has been received, and then write them to the connection buffer for a particular upstream Redis in one go. See #6365 for more details and discussion.
This PR implements a timer solution to the problem. The algorithm is as follows:
Batching is defaulted to off in the config- zero batch size, zero timer, so it should behave the same as the old way. Note that there is the extra overhead of a timer check in
ClientImpl::flushBufferAndResetTimer()
even if we have zero batch size- this may add slight overhead, but I have not been able to characterize the performance degradation in any meaningful way*. If there is degradation, or if desired by reviewers I can add a 3rd if statement to handle this inclient_impl.cc
ieif (config_.maxBufferSizeBeforeFlush()==0)
where we write without a timer check.Besides updating unit tests, I've added 3 specific cases:
*I have done some evaluation on performance on my local VM- batching is definitely faster with single redis behind envoy. However, for real-world tests against our redis-benchmark machines in AWS we will have to wait until later this week.