add degradeCutHalfOpen and fix exceptionCount bug #525

CalvinKirs · 2019-02-27T10:44:37Z

Describe what this PR does / why we need it

A new half-open state before the fuse is closed will pass a little flow, and if it is normal, it will be fully open. Avoid service without recovering large amounts of traffic
fix bug to DEGRADE_GRADE_EXCEPTION_COUNT。
降级这里由之前的两种状态变为三种状态，降级关闭，降级打开，降级半开，当降级关闭后后进入一个半开状态，这里会放五个请求过去，如果这五个请求都正常，那么则会由降级半开状态变为降级关闭，有任何一个请求失败，直接进入降级全开状态（我认为业务方应该捕获正常的业务异常，如果5个请求中有一个失败那一定是系统级别的异常，所以此处进入降级是没有问题的）。另外之前异常数与异常比率这里在处于同一个时间窗口时再次判断是会出现问题的，即无论系统是否正常也一定会再次降级。这里异常数目会做一个判断（异常比例没必要，因为我放进去的五个请求成功的话会使得比例倾斜），如果处于同一个时间窗口，会进行<=操作进行判断，如果不是，则维持原有操作(<)。说明：如果处于同一个时间窗口的话没必要进行降级半开状态，处于不同时将窗口的话如果系统未恢复正常，那么就会有一个很尴尬的情况，他会等待他的异常数或者异常数目进入他的一个阈值才会进行降级。比如我的异常数目是1000，那么这里的话熔断关闭后，这个时候可能系统还没有恢复正常，那么会直接放进来1000个请求。

Merge state

Does this pull request fix one issue?

Describe how you did it

Describe how to verify it

ut

Special notes for reviews

codecov-io · 2019-02-27T10:49:40Z

Codecov Report

Merging #525 into master will increase coverage by 0.1%.
The diff coverage is 37.31%.

@@             Coverage Diff             @@
##             master     #525     +/-   ##
===========================================
+ Coverage     38.02%   38.12%   +0.1%     
- Complexity     1114     1123      +9     
===========================================
  Files           259      259             
  Lines          8171     8306    +135     
  Branches       1113     1130     +17     
===========================================
+ Hits           3107     3167     +60     
- Misses         4661     4726     +65     
- Partials        403      413     +10

Impacted Files	Coverage Δ	Complexity Δ
...sp/sentinel/slots/statistic/data/MetricBucket.java	`77.27% <0%> (-22.73%)`	`19 <0> (ø)`
...a/csp/sentinel/slots/statistic/base/LongAdder.java	`31.91% <0%> (ø)`	`10 <0> (ø)`	⬇️
...a/com/alibaba/csp/sentinel/node/StatisticNode.java	`66.27% <30.76%> (-15.69%)`	`28 <0> (ø)`
.../csp/sentinel/slots/block/degrade/DegradeRule.java	`52.06% <46.55%> (-3.49%)`	`20 <12> (+3)`
...p/sentinel/slots/statistic/metric/ArrayMetric.java	`68.93% <5.26%> (-10.01%)`	`29 <1> (+1)`
...k/flow/controller/WarmUpRateLimiterController.java	`67.64% <70%> (ø)`	`5 <0> (ø)`	⬇️
...s/block/flow/controller/RateLimiterController.java	`79.31% <70%> (ø)`	`7 <0> (ø)`	⬇️
...a/csp/sentinel/slots/statistic/base/LeapArray.java	`65.82% <0%> (-3.8%)`	`24% <0%> (-1%)`
.../com/alibaba/csp/sentinel/eagleeye/StatLogger.java	`66.66% <0%> (-2.57%)`	`11% <0%> (-1%)`
...ibaba/csp/sentinel/eagleeye/StatLogController.java	`65.93% <0%> (-2.2%)`	`5% <0%> (-1%)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d3ccb0...a7dd38c. Read the comment docs.

sczyh30 · 2019-03-04T05:40:22Z

Hi, thanks for contributing. Please illustrate your design and how-to-do in the PR description.

CalvinKirs · 2019-03-04T05:48:31Z

嗨，谢谢你的贡献。请说明您的设计和PR描述中的操作方法。

降级这里由之前的两种状态变为三种状态，降级关闭，降级打开，降级半开，当降级关闭后后进入一个半开状态，这里会放五个请求过去，如果这五个请求都正常，那么则会由降级半开状态变为降级关闭，有任何一个请求失败，直接进入降级全开状态（我认为业务方应该捕获正常的业务异常，如果5个请求中有一个失败那一定是系统级别的异常，所以此处进入降级是没有问题的）。另外之前异常数与异常比率这里在处于同一个时间窗口时再次判断是会出现问题的，即无论系统是否正常也一定会再次降级。这里异常数目会做一个判断（异常比例没必要，因为我放进去的五个请求成功的话会使得比例倾斜），如果处于同一个时间窗口，会进行<=操作进行判断，如果不是，则维持原有操作(<)。说明：如果处于同一个时间窗口的话没必要进行降级半开状态，处于不同时将窗口的话如果系统未恢复正常，那么就会有一个很尴尬的情况，他会等待他的异常数或者异常数目进入他的一个阈值才会进行降级。比如我的异常数目是1000，那么这里的话熔断关闭后，这个时候可能系统还没有恢复正常，那么会直接放进来1000个请求。

sczyh30 · 2019-03-04T07:17:37Z

对于 HALF-OPEN 而言，根据模式的不同，对应的策略也有不同：

对于基于异常判断（异常比率/异常数）的模式，HALF-OPEN 的一般策略是，只尝试一个请求，如果该请求正常则置为 CLOSE 状态，否则重新打回 OPEN 状态；
对于基于 RT 判断的模式，HALF-OPEN 状态同样只尝试一个请求，如果该请求 RT 正常则置为 CLOSE 状态，否则重新打回 OPEN 状态；这里需要考虑以下几点：(1) 若尝试的这个请求失败了，则是不会计算 RT 的（因此 RT 不会超），但请求确实是失败的，这时候要不要打回 OPEN 状态（2）HALF-OPEN 状态判断时不能用 average 值来作判断，而是要用最近一次请求的 RT。

另外考虑兼容性的问题，降级规则可能需要加个 enableHalfOpen 字段来控制是否加入半开启模式。

References:

For HALF-OPEN state, the corresponding action varies according to the circuit breaking strategy:

For exception-based mode (ratio or count), the general strategy of HALF-OPEN is to try only one request, and if the request is normal, it is set to CLOSE state, otherwise it will return to OPEN state;
For RT-based mode, the HALF-OPEN state should also attempt only one request. If RT of the request is normal, it is set to the CLOSE state, otherwise it is back to the OPEN state; here are the following points to consider: (1) If the request fails, the RT will not be calculated (so the RT will not be exceeded), but the request indeed failed. We should consider whether to set back the OPEN state in such scenario. (2) The HALF-OPEN state checking cannot be judged by the average RT. Instead, use the exact RT of the most recent request.

Also considering the compatibility issue, the circuit breaking rule may need an additional enableHalfOpen field to control whether to enable circuit-breaking mode.

sczyh30 · 2019-03-04T07:42:06Z

另外之前异常数与异常比率这里在处于同一个时间窗口时再次判断是会出现问题的，即无论系统是否正常也一定会再次降级。这里异常数目会做一个判断（异常比例没必要，因为我放进去的五个请求成功的话会使得比例倾斜），如果处于同一个时间窗口，会进行<=操作进行判断，如果不是，则维持原有操作(<)。

这里仍然会存在滑动窗口长度 > 降级时间的问题。比如分钟级异常数模式，分钟级滑动窗口的格子数为 60（每个格子时间长度为 1s），则降级时间比较小且异常分布在最近的格子比较多时，仍有可能出现熔断结束后异常数 < 阈值时就会被降级的问题。在这种统计模式下，一般需要设置降级时间 > 统计窗口长度。

# Conflicts: # sentinel-core/src/main/java/com/alibaba/csp/sentinel/node/StatisticNode.java # sentinel-core/src/main/java/com/alibaba/csp/sentinel/slots/statistic/data/MetricBucket.java # sentinel-core/src/main/java/com/alibaba/csp/sentinel/slots/statistic/metric/Metric.java

CalvinKirs · 2019-03-05T02:51:52Z

另外之前异常数与异常比率这里在处于同一个时间窗口时再次判断是会出现问题的，即无论系统是否正常也一定会再次降级。这里异常数目会做一个判断（异常比例没必要，因为我放进去的五个请求成功的话会使得比例倾斜），如果处于同一个时间窗口，会进行<=操作进行判断，如果不是，则维持原有操作（<）。

这里仍然会存在滑动窗口长度>降级时间的问题。比如分钟级异常数模式，分钟级滑动窗口的格子数为60（每个格子时间长度为1s），则降级时间比较小且异常分布在最近的格子比较多时，仍有可能出现熔断结束后异常数<在此种统计模式下，一般需要设置降级时间>统计窗口长度。

目前是采取这样的做法的，基于rt以及异常比例（s，ms级别的），会把当前窗口的进行清0。因为有一点需要考虑，rt这里的话准降级状态个人认为是必须的，成功状态下肯定没问题，五个请求都OK然后系统继续正常（即未降级），但是如果失败了的话这里不进行清除操作的话有可能存在这样一个问题，系统本来就是处于不正常情况，这五个请求耗时很严重。（如果未做dubbo超时限制），那么熔断后这五个请求会将整体的平均值拉的超级高而高出平均值。（极端情况）。异常比例这里的话这里因为时间短所以也是可以接受的，而且只会清除当前秒级别的异常比例。
异常数目这里是会在降级前进行-1操作。（会判断是否是同一个时间窗口），如果不是同一个时间窗口，那么就会不做处理。

CalvinKirs · 2019-03-05T02:59:23Z

降级半开的开关也加上了，默认是关闭的。业务方可以自行选择是否开启

sczyh30 · 2019-03-05T05:44:00Z

sentinel-core/src/main/java/com/alibaba/csp/sentinel/node/Node.java

+    /**
+     * minus exception
+     */
+    void minusException();


这里的设计可能还要再考量一下。Sentinel 底层的统计数据结构是限流、降级等等共用的，并且监控日志也会从实时统计数据生成，因此需要保证正确性。直接进行 decrease 和 rest 操作会使得实时统计数据不准确。

您好，目前来看的话限流这里是用总的pass数来进行计算的。（这样是最合理的而且目前也确实是这样做的）。
基于rt这里也已经完善，是拿最近一次请求的rt与count进行比较的。另外这里的话清除的话只会清除当前时间窗口的一个异常rt总和，最小为1个rt（经过多个时间窗口，每次刚进入时间窗口进进入准降级状态），最大为5个rt（持续一个窗口的一个准降级-降级的rt总和）。
异常数目这里还是原有的逻辑，降级关闭之后放进一个请求，依赖于这个请求的结果来进行是否开关。降级打开之前会对分钟秒级别的纬度减去各自所属的时间窗口的值。如果是新的时间窗口则不做操作。
我这里觉得降级关闭之后应该是一个新的开始，所以-1个异常数这里是可以接受的。不这么做的话异常数目这里位于同一个时间窗口降级关闭之后始终是要再次打开的。因为count已经达到了阈值，必须去减

# Conflicts: # sentinel-core/src/main/java/com/alibaba/csp/sentinel/slots/block/degrade/DegradeRule.java

…aba#525 initially.

…ISSUE alibaba#525].

[ISSUE alibaba#525] Support the message track. Merge to branch msg_track

…ave supported trace_topic name value Configurable by users.

…crease code coverage.

[ISSUE alibaba#525] Support the message track,add the function which supports trace topic name value configurable by users.

jiangli added 2 commits February 27, 2019 15:46

add degradeCutHalfOpen

b06c824

add degradeCutHalfOpen

01fb503

sczyh30 added the to-review To review label Feb 27, 2019

CalvinKirs and others added 6 commits March 1, 2019 18:48

fine tuning

d5783ec

fine tuning

aaf0e4d

Beautiful misunderstanding~

d952ecd

fix 异常数熔断的错误逻辑（同一个时间窗口）

8431b0a

merge

2ef285d

Merge branch 'master' into master

ff6eb12

CalvinKirs changed the title ~~add degradeCutHalfOpen~~ add degradeCutHalfOpen and fix exceptionCount bug Mar 4, 2019

CalvinKirs added 5 commits March 4, 2019 19:50

微调

7fed83f

微调

e544f5b

微调

0bcc3db

微调

bbb630b

冗余代码删除

0cff032

sczyh30 requested changes Mar 5, 2019

View reviewed changes

CalvinKirs and others added 6 commits March 5, 2019 17:03

微调

81e9829

微调

5181c65

微调

f42c9e1

微调

f80a2ba

Merge branch 'master' into master

b390f53

微调

1e91194

CalvinKirs and others added 13 commits March 5, 2019 19:59

Merge remote-tracking branch 'origin/master'

273819f

基于rt的降级修改

bb7046f

错误修改

9666ddc

微调

9ff83a0

微调

7e83a09

基于rt这里如果半开状态下放入的流量出现异常，我们认为这个时候系统是不正常的，所以继续降级

a7a0dad

Merge branch 'master' into master

203e83d

微调

0a3d864

Merge remote-tracking branch 'origin/master'

c12151c

Delete DegradeRule.java

970f814

微调

a4e06b6

Merge remote-tracking branch 'origin/master'

3628e0a

# Conflicts: # sentinel-core/src/main/java/com/alibaba/csp/sentinel/slots/block/degrade/DegradeRule.java

the end

a7dd38c

CalvinKirs closed this Mar 7, 2019

sczyh30 removed the to-review To review label Mar 7, 2019

sczyh30 mentioned this pull request Mar 8, 2019

add half-open and fix degrade-bug #553

Closed

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

I have finished developing the new feature for the message track alib…

e04db85

…aba#525 initially.

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

I have already fix my code as suggestion.Please help to review them.[…

47556d6

…ISSUE alibaba#525].

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

Merge pull request alibaba#597 from zongtanghu/develop

c50ada6

[ISSUE alibaba#525] Support the message track. Merge to branch msg_track

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

[ISSUE alibaba#525] Support the message track,these committed codes h…

cde6f40

…ave supported trace_topic name value Configurable by users.

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

[ISSUE alibaba#525] Support the message track,add the test case to in…

d480c81

…crease code coverage.

CST11021 pushed a commit to CST11021/Sentinel that referenced this pull request Nov 3, 2021

Merge pull request alibaba#600 from zongtanghu/msg_track

d572ffa

[ISSUE alibaba#525] Support the message track,add the function which supports trace topic name value configurable by users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add degradeCutHalfOpen and fix exceptionCount bug #525

add degradeCutHalfOpen and fix exceptionCount bug #525

CalvinKirs commented Feb 27, 2019 •

edited

Loading

codecov-io commented Feb 27, 2019 •

edited

Loading

sczyh30 commented Mar 4, 2019

CalvinKirs commented Mar 4, 2019 •

edited

Loading

sczyh30 commented Mar 4, 2019 •

edited

Loading

sczyh30 commented Mar 4, 2019

CalvinKirs commented Mar 5, 2019 •

edited

Loading

CalvinKirs commented Mar 5, 2019 •

edited

Loading

sczyh30 Mar 5, 2019

CalvinKirs Mar 6, 2019

add degradeCutHalfOpen and fix exceptionCount bug #525

add degradeCutHalfOpen and fix exceptionCount bug #525

Conversation

CalvinKirs commented Feb 27, 2019 • edited Loading

Describe what this PR does / why we need it

Does this pull request fix one issue?

Describe how you did it

Describe how to verify it

Special notes for reviews

codecov-io commented Feb 27, 2019 • edited Loading

Codecov Report

sczyh30 commented Mar 4, 2019

CalvinKirs commented Mar 4, 2019 • edited Loading

sczyh30 commented Mar 4, 2019 • edited Loading

sczyh30 commented Mar 4, 2019

CalvinKirs commented Mar 5, 2019 • edited Loading

CalvinKirs commented Mar 5, 2019 • edited Loading

sczyh30 Mar 5, 2019

Choose a reason for hiding this comment

CalvinKirs Mar 6, 2019

Choose a reason for hiding this comment

CalvinKirs commented Feb 27, 2019 •

edited

Loading

codecov-io commented Feb 27, 2019 •

edited

Loading

CalvinKirs commented Mar 4, 2019 •

edited

Loading

sczyh30 commented Mar 4, 2019 •

edited

Loading

CalvinKirs commented Mar 5, 2019 •

edited

Loading

CalvinKirs commented Mar 5, 2019 •

edited

Loading