-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: improve capacity unit calculation #339
Conversation
src/server/info_collector.cpp
Outdated
_cu_fetch_retry_count = 3; | ||
_cu_fetch_retry_wait_seconds = 1; | ||
|
||
_st_fetch_interval_seconds = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改 _storage_fetch_interval_seconds,不然这个配置名反人类了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加全吧, _storage_size_fetch_interval_seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那改成storage_size_fetch_interval_seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
配置文件里面该成 storage_size_fetch_interval_seconds,但是变量名不用改,毕竟也没啥歧义,不会造成困扰。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/server/info_collector.cpp
Outdated
@@ -26,6 +26,7 @@ namespace server { | |||
|
|||
DEFINE_TASK_CODE(LPC_PEGASUS_APP_STAT_TIMER, TASK_PRIORITY_COMMON, ::dsn::THREAD_POOL_DEFAULT) | |||
DEFINE_TASK_CODE(LPC_PEGASUS_CU_STAT_TIMER, TASK_PRIORITY_COMMON, ::dsn::THREAD_POOL_DEFAULT) | |||
DEFINE_TASK_CODE(LPC_PEGASUS_ST_STAT_TIMER, TASK_PRIORITY_COMMON, ::dsn::THREAD_POOL_DEFAULT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也一样,还是不追求名字短,追求可读吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
continue; | ||
if (pc.partition_flags != 0) // already calculated | ||
continue; | ||
pc.partition_flags = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition_flags 只有这里会用到吗?如果其它地方也改了partition_flags会不会有问题?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app_partitions 是个临时变量,partition_flags 在这个函数里根本不会用到,所以不管它有没有在其他地方使用,这里就是被用来当做一个临时flag。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在这个函数里面,只有这里使用,用于临时flag。
src/server/info_collector.cpp
Outdated
_cu_fetch_retry_count = 3; | ||
_cu_fetch_retry_wait_seconds = 1; | ||
|
||
_st_fetch_interval_seconds = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加全吧, _storage_size_fetch_interval_seconds?
src/server/config.ini
Outdated
app_stat_interval_seconds = 10 | ||
|
||
cu_stat_app = stat | ||
cu_fetch_interval_seconds = 8 | ||
st_fetch_interval_seconds = 3600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
跟上面的配置为什么相差这么大?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改进点3:增加storage size统计,默认每小时统计一次,统计每个表的存储空间(和app_stat命令一样)。因为变化不大,所以没必要太频繁。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/server/info_collector.cpp
Outdated
@@ -65,6 +66,17 @@ info_collector::info_collector() | |||
"cu_fetch_interval_seconds", | |||
8, // default value 8s | |||
"capacity unit fetch interval seconds"); | |||
_cu_fetch_retry_count = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写成std::min(3, _cu_fetch_interval_seconds)
吧, 避免cu_fetch_interval_seconds配的很小, 跟重试有重叠
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/server/info_collector.cpp
Outdated
@@ -65,6 +66,17 @@ info_collector::info_collector() | |||
"cu_fetch_interval_seconds", | |||
8, // default value 8s | |||
"capacity unit fetch interval seconds"); | |||
_cu_fetch_retry_count = 3; | |||
_cu_fetch_retry_wait_seconds = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个如果固定为1的话, 写成静态常量好了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/server/info_collector.cpp
Outdated
|
||
void info_collector::on_storage_size_stat(int remaining_retry_count) | ||
{ | ||
ddebug("start to stat storage size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去掉这行日志吧, 没有意义
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个还是有必要的,表示在正常工作,和available_detector里面一样。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我是感觉正常情况下, 这种周期性的任务就没必要打日志了, 只在异常分支打日志, 不然日志文件中会充满这类日志.
如果是debug问题, 有其他手段, 比如pstack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
毕竟没有多少日志,这对查找问题还是有帮助的,先不改了
src/server/info_collector.cpp
Outdated
} | ||
return; | ||
} | ||
_result_writer->set_result(st_stat.timestamp, "st", st_stat.dump_to_json()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非得简化的话,sz
可能更好一点
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage_size不应当是ss吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
std::string dump_to_json() const | ||
{ | ||
std::map<int32_t, std::vector<int64_t>> values; | ||
for (auto kv : cu_value_by_app) { | ||
auto &pair = kv.second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kv
没加引用, pair
也没有必要加
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
都得加
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -66,9 +66,11 @@ class info_collector | |||
void on_app_stat(); | |||
AppStatCounters *get_app_counters(const std::string &app_name); | |||
|
|||
void on_capacity_unit_stat(); | |||
void on_capacity_unit_stat(int remaining_retry_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 retry_count
吧, 函数本身没有remaining这层概念
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得有remaining语义更清楚,表示还允许多少次重试。不然只是retry_count,那么retry_count=3可能被理解为这次调用是第3次重试。
Former-commit-id: c2b15d46396af85529ba4c164e0f7c13f0f139fd [formerly c5dbab2] Former-commit-id: 50b17f8d276466cc00d671e3d40fa3f7749aa49c
What problem does this PR solve?
Current capacity unit calculation (committed in #318) has these problems:
cu@
in sort_key to diff with storage size stat):[app_partition_count, stat_partition_count, storage_size_in_mb]
, wherestat_partition_count
is the count of partitions whose storage is accounted.perf-counters-by-substr
, we use simple sub-string search to replace regex matching, which probably reduces loading pressure of replica-server and gives better performance.What is changed and how it works?
As mentioned above
Check List
Tests