replica-server: support table level write throttling #204

qinzuoyan · 2018-12-06T06:51:12Z

User can enable write throttling through setting app's env:

replica.write_throttling = {delay_threshold_qps}*delay*{delay_ms},{reject_threshold_qps}*reject*{delay_ms_before_reject}

for example:

replica.write_throttling = 1000*delay*100,2000*reject*200

which means:

if the current write speed exceeds the delay threshold of qps '1000', then firstly delay for 100ms before processing the following received requests;
if the current write speed exceeds the reject threshold of qps '2000', then firstly delay for 200ms before rejecting the following received requests;

User can provide only one of 'delay' or 'reject' strategy, like:

replica.write_throttling = 1000*delay*100

or

replica.write_throttling = 2000*reject*200

neverchanje · 2018-12-06T07:12:24Z

有没有具体的效果可以截图展示一下，贴在这里？

include/dsn/dist/replication/replication.codes.h

src/dist/replication/lib/replica_config.cpp

qinzuoyan · 2018-12-06T09:41:01Z

在测试集群上进行功能验证。

集群：c4tst-benchmark
ReplicaServer节点数：5
测试表Partition数：128

客户端：pegasus-YCSB
客户端进程数：10
单个进程中线程数：10
单条数据value大小：10 KB
写入失败的处理逻辑：立即重试

在启动测试之前（16:20左右），设置测试表的throttling环境变量：【1】

replica.write_throttling = 1000*delay*100,1200*reject*200

测试一段时间后（16:42左右），改变测试表的throttling环境变量：【2】

replica.write_throttling = 2000*delay*100,2200*reject*200

测试一段时间后（16:55左右），改变测试表的throttling环境变量：【3】

replica.write_throttling = 2000*delay*50,2200*reject*200

测试一段时间后（17:00左右），改变测试表的throttling环境变量：【4】

replica.write_throttling = 2000*delay*50,2200*reject*50

测试一段时间后（17:07左右），改变测试表的throttling环境变量：【5】

replica.write_throttling = 2000*delay*500,2500*reject*500

测试一段时间后（17:17左右），改变测试表的throttling环境变量：【6】

replica.write_throttling = 2000*delay*100

测试一段时间后（17:22左右），改变测试表的throttling环境变量：【7】

replica.write_throttling = 2000*delay*500

通过falcon记录以下4条曲线：

QSP
P99 Set 服务端延迟（单位：纳秒）
Delay数据条数（统计各表最近10秒的throttling delay数据条数）
Reject数据条数（统计各表最近10秒的throttling reject数据条数）

曲线图变化如下：

src/dist/replication/lib/replica.h

src/dist/replication/lib/throttling_controller.h

src/dist/replication/lib/replica.cpp

src/dist/replication/lib/throttling_controller.h

src/core/perf_counter/perf_counters.cpp

src/dist/replication/lib/throttling_controller.h

acelyc111 · 2018-12-07T06:59:23Z

src/dist/replication/lib/throttling_controller.cpp

+        changed = false;
+        return true;
+    }
+    reset(changed, old_str);


从一个值调为另一个值时，中间是不是会被短暂关闭一下？

因为都是串行的，所以中间关闭也没有问题。不过这里的逻辑有点tricky，我重构一下。

src/dist/replication/lib/replica_2pc.cpp

src/dist/replication/lib/throttling_controller.cpp

src/dist/replication/lib/throttling_controller.h

neverchanje · 2018-12-08T10:53:32Z

完成度高的话quota确实是比qps来讲更好的选择，不过我也能理解第一版只做 qps 比较简单 😄
app_env 里填 json 的话感觉 list app 的时候显示起来比较难看，但是 json 是比较有扩展性的做法

shengofsun · 2018-12-08T15:07:08Z

完成度高的话quota确实是比qps来讲更好的选择，不过我也能理解第一版只做 qps 比较简单 😄
app_env 里填 json 的话感觉 list app 的时候显示起来比较难看，但是 json 是比较有扩展性的做法

我是说限流"qps"还是限流"mb/s"这种比较好，这两个的实现难度感觉差不多

shengofsun · 2018-12-09T01:12:08Z

src/dist/replication/lib/replica_2pc.cpp

+                _counter_recent_write_throttling_delay_count->increment();
+            } else { // type == throttling_controller::REJECT
+                request->add_ref();
+                tasking::enqueue(LPC_WRITE_THROTTLING_DELAY,


另外reject是不是应该立即返回？你delay了一段时间再reject，很有可能客户端已经超时了，从而引发重试。尽早把reject的消息返回去，让client sdk有一个把error message抛给上层的机会。这样对于错误处理和我们排错都更方便一些。
不然客户端那边只知道一个超时，server端看log才知道什么原因。

更进一步看，一条message究竟是该reject还是该delay，应该是根据客户端的超时时长来决定的，而不是直接拍脑袋配的？

这里我仔细考虑过，思量再三，最后觉得还是reject的时候在server端统一delay下比较好。
要考虑一般用户的使用方式：

很多用户在catch异常后都是立即重试（可能有些有经验的用户会sleep一下，但是按照我们的经验这样的用户比较少），一方面太频繁的请求会增大server压力，另一方面立即重试再次失败的概率也很大。所以从根本上我觉得“尽早把reject的消息返回去”并不是一个最佳策略，我希望通过一种机制来人为制造一种delay。而在客户端制造delay的话，一是需要用户都升级到最新版本客户端，二是每种语言都要增加这种机制。所以比较简单的办法就是在server端制造delay。

一般用户都是直接catch PException异常，要么直接重试要么放弃，通常并不会区分异常的具体类型再做处理。在delay一段时间后reject，客户端会收到PERR_BUSY或者PERR_TIMEOUT错误，通常写操作的超时都设得比较大，所以一般都是PERR_BUSY错误。

我也考虑过获取客户端的超时时长来确定delay长度，但是当时是考虑到有的语言没有传Timeout过来。不过我这里可以再优化一下：如果客户端有把timeout传过来，则可以使用这个信息（譬如 delay = client_timeout / 2）；如果客户端没传，则使用配置的delay时间。这样尽量让客户端能收到ERR_BUSY的异常。

timeout貌似是一个message的字段，是所有客户端都必须得传的？不传其实意思就是按默认值来。

server端delay这种设计感觉比较诡异，因为你把错误抛给客户端之后，客户端的有机会知道为什么timeout。而你不扔回去，oncall的压力就还在server端这边。其实就是这么想这个问题：业务跑过来和你说，又出问题了怎么回事？是stacktrace上有个err_busy好，还是stack_trace上有个err_timeout好……再说了，升级一下客户端版本其实也没啥吧。

而且在server端delay，这些都是要排到asio的队列里面的，它都要和别的timer抢资源的。

这里我现在允许设置reject的时候delay为0，这样两种方式都可以用，毕竟server端delay我觉得还是有必要提供的。
你说的server端delay的问题，我觉得无非是多用点内存，asio队列这点压力不大，你想想这里的delay一般都是1秒以内，能积压多少呢？我们读QPS还是经常几十万呢，和使用的timeout timer来比，这点压力完全不用担心。

我确认了一下，C++客户端是设置的client_timeout的，但是java客户端没有。
所以我刚刚提交了一个改进： XiaoMi/pegasus-java-client#24
然后我再改进一下吧，尽量用上client端传过来的client_timeout，避免因为delay造成超时。

neverchanje · 2018-12-09T13:50:12Z

我是说限流"qps"还是限流"mb/s"这种比较好，这两个的实现难度感觉差不多

qps 和 mb/s 都可以有，我之前看 hbase 好像就是有 qps/size/quota(capacity unit) 三种限流方式，现在只有一种问题也不大，业务对 qps 的预算和实际qps我感觉还是比较一致的。

qinzuoyan · 2018-12-09T14:02:36Z

关于是基于qps还是size做限流，我其实都考虑过，并且周三在给大家分享throttling方案的时候，这两种限流方式我都说过。在实现的时候，基于size的限流我还写过部分代码，但是后来觉得还不成熟，所以又去掉了。主要原因是：

计算不准确。目前拿到message做流控检查的时候，rpc buffer还没有反序列化，包含header还有其他信息，这样计算出来的size并不准确（尤其是在value size比较小的时候）。
使用习惯。考虑到大部分业务的value size都是比较稳定的，所以基于QPS通常也够用了，而且目前我们经验上也更习惯QPS。考虑下在真正运维过程中，如果要设置一个表的流控，我们是习惯算size还是习惯算qps。
监控。目前对qps的监控比较全面，对size的监控还不够。所以基于qps做流控，监控上也更友好。

目前开发feature的原则是：来自业务需求；方案简单有效；做最小需求集合，避免过早优化；易扩展。
后面在使用过程中，如果基于qps的流控不够用，可以再增加基于size的流控。

shengofsun · 2018-12-10T03:53:27Z

size还是qps问题不大，我也就提一下。

src/dist/replication/lib/throttling_controller.cpp

src/dist/replication/lib/throttling_controller.h

neverchanje · 2018-12-10T08:56:19Z

src/dist/replication/lib/throttling_controller.cpp

+    if (_reject_qps > 0 && _cur_request_count > _reject_qps) {
+        _cur_request_count--;
+        int64_t client_timeout = request->header->client.timeout_ms;
+        if (client_timeout > 0) {


假如我 client_timeout = 20s, env.delay_ms = 1s，按这算法是不是这个请求应该要延迟 10s 才执行嘛，这样不对。应该仍然是只 delay 1s，避免系统过低负载。
有个麻烦但是比较鲁棒的写法：

if(client_timeout > 0 && client_timeout/2 < _delay_ms) { delay_ms = client_timeout/2; } else { delay_ms = _delay_ms; }

仔细看看，std::min()

neverchanje · 2018-12-10T08:58:56Z

src/dist/replication/lib/throttling_controller.cpp

+    }
+    if (_delay_qps > 0 && _cur_request_count > _delay_qps) {
+        int64_t client_timeout = request->header->client.timeout_ms;
+        if (client_timeout > 0) {


delay 和 reject 都会用到这个 delay_ms 的计算的代码，这里代码重复了，麻烦抽成一个函数把：

int64_t calculate_request_delay_ms(int64_t client_timeout_ms, int64_t delay_ms) { }

没几行代码，没必要

neverchanje · 2018-12-10T09:02:22Z

src/dist/replication/lib/throttling_controller.cpp

+        if (client_timeout > 0) {
+            delay_ms = std::min(_delay_ms, client_timeout / 2);
+        } else {
+            delay_ms = _delay_ms;


_delay_ms 建议改名 _env_delay_ms 表示是 app_env 中设置的延时时间。_delay_ms 与 delay_ms 名字相近，不易辨认。

我觉得还好吧，IDE显示的颜色都不一样。如果这里不易辨认的话，那么很多构造函数中（参数和私有成员也只有下划线的区别）都存在这个问题。。。

neverchanje · 2018-12-10T09:14:03Z

src/dist/replication/lib/throttling_controller.h

+    // 'parse_error' is set when return false.
+    // 'changed' is set when return true.
+    // 'old_env_value' is set when 'changed' is set to true.
+    bool parse_from_env(const std::string &env_value,


注释写一下正确的 env_value 格式应该是怎么样的：

Here is the correct format for `env_value`: [qps(int32)]*[delay_ms(int32)]*["reject"/"delay"]

没什么大问题就先这样吧，一修改又要重新approve。这个拖太久了。

replica-server: support table level write throttling

d1ec95f

qinzuoyan added type/enhancement Indicates new feature requests component/replica-server labels Dec 6, 2018

qinzuoyan requested review from shengofsun, neverchanje, acelyc111, hycdong, zhangyifan27, vagetablechicken and mentoswang December 6, 2018 06:51

qinzuoyan mentioned this pull request Dec 6, 2018

server: support table level write throttling apache/incubator-pegasus#230

Merged

neverchanje reviewed Dec 6, 2018

View reviewed changes

include/dsn/dist/replication/replication.codes.h Outdated Show resolved Hide resolved

neverchanje reviewed Dec 6, 2018

View reviewed changes

src/dist/replication/lib/replica_config.cpp Outdated Show resolved Hide resolved

fix according to code review

4d024cc

neverchanje reviewed Dec 6, 2018

View reviewed changes

src/dist/replication/lib/replica.h Outdated Show resolved Hide resolved

fix according to code review

50858de

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.h Outdated Show resolved Hide resolved

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/replica.cpp Outdated Show resolved Hide resolved

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.h Outdated Show resolved Hide resolved

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.h Show resolved Hide resolved

fix according to code review

f5e4866

acelyc111 reviewed Dec 7, 2018

View reviewed changes

qinzuoyan added 3 commits December 7, 2018 16:11

fix according to code review

967629e

smallfix

b2942f5

fix according to code review

37a8fe7

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.cpp Outdated Show resolved Hide resolved

neverchanje reviewed Dec 7, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.h Show resolved Hide resolved

fix according to code review

44b075d

shengofsun reviewed Dec 9, 2018

View reviewed changes

fix according to code review

c4d990f

qinzuoyan dismissed neverchanje’s stale review via c4d990f December 10, 2018 03:15

neverchanje previously approved these changes Dec 10, 2018

View reviewed changes

fix according to code review

1c64e6b

qinzuoyan dismissed neverchanje’s stale review via 1c64e6b December 10, 2018 04:24

shengofsun previously approved these changes Dec 10, 2018

View reviewed changes

neverchanje previously approved these changes Dec 10, 2018

View reviewed changes

fix according to code review

419ef2e

qinzuoyan dismissed stale reviews from neverchanje and shengofsun via 419ef2e December 10, 2018 08:01

shengofsun previously approved these changes Dec 10, 2018

View reviewed changes

neverchanje reviewed Dec 10, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.cpp Show resolved Hide resolved

fix according to code review

4825fa3

qinzuoyan dismissed shengofsun’s stale review via 4825fa3 December 10, 2018 08:07

neverchanje reviewed Dec 10, 2018

View reviewed changes

src/dist/replication/lib/throttling_controller.h Outdated Show resolved Hide resolved

fix according to code review

60b2603

neverchanje reviewed Dec 10, 2018

View reviewed changes

shengofsun approved these changes Dec 10, 2018

View reviewed changes

neverchanje reviewed Dec 10, 2018

View reviewed changes

neverchanje approved these changes Dec 10, 2018

View reviewed changes

qinzuoyan merged commit 62ad5b7 into master Dec 10, 2018

qinzuoyan deleted the table_throttling branch December 10, 2018 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replica-server: support table level write throttling #204

replica-server: support table level write throttling #204

qinzuoyan commented Dec 6, 2018

neverchanje commented Dec 6, 2018

qinzuoyan commented Dec 6, 2018 •

edited

Loading

acelyc111 Dec 7, 2018

qinzuoyan Dec 7, 2018

neverchanje commented Dec 8, 2018

shengofsun commented Dec 8, 2018 •

edited

Loading

shengofsun Dec 9, 2018

qinzuoyan Dec 9, 2018

shengofsun Dec 10, 2018 •

edited

Loading

qinzuoyan Dec 10, 2018

qinzuoyan Dec 10, 2018

qinzuoyan Dec 10, 2018

neverchanje commented Dec 9, 2018 •

edited

Loading

qinzuoyan commented Dec 9, 2018 •

edited

Loading

shengofsun commented Dec 10, 2018

neverchanje Dec 10, 2018

qinzuoyan Dec 10, 2018

neverchanje Dec 10, 2018 •

edited

Loading

qinzuoyan Dec 10, 2018

neverchanje Dec 10, 2018

qinzuoyan Dec 10, 2018

neverchanje Dec 10, 2018 •

edited

Loading

qinzuoyan Dec 10, 2018

replica-server: support table level write throttling #204

replica-server: support table level write throttling #204

Conversation

qinzuoyan commented Dec 6, 2018

neverchanje commented Dec 6, 2018

qinzuoyan commented Dec 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje commented Dec 8, 2018

shengofsun commented Dec 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shengofsun Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje commented Dec 9, 2018 • edited Loading

qinzuoyan commented Dec 9, 2018 • edited Loading

shengofsun commented Dec 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neverchanje Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinzuoyan commented Dec 6, 2018 •

edited

Loading

shengofsun commented Dec 8, 2018 •

edited

Loading

shengofsun Dec 10, 2018 •

edited

Loading

neverchanje commented Dec 9, 2018 •

edited

Loading

qinzuoyan commented Dec 9, 2018 •

edited

Loading

neverchanje Dec 10, 2018 •

edited

Loading

neverchanje Dec 10, 2018 •

edited

Loading