-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syslog rate limit design #1049
Syslog rate limit design #1049
Conversation
Hi @wen587 , comment fixed. Could you please review again? |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please ask for more approvals to merge.
@venkatmahalingam , @lguohan , @madhupalu , @tzack000 , could you please review and sign-off? |
|
||
![container_config_on_fly](/doc/syslog/images/container_rate_limit_config_flow.svg). | ||
|
||
> Note: according to test, syslog rate limit configuration on host side would not affect container side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox IMHO, it's bit confusing comparing to how it's done in AUTO TECHSUPPORT, where global configuration overrides feature rate limit configuration. Here, using 'GLOBAL', does not reflect the actual intentions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about HOST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox unfortunately i don't have neither a strong opinion here, nor a good suggestion
|
||
![cli_rate_limit_config_flow](/doc/syslog/images/cli_rate_limit_config_flow.svg) | ||
|
||
#### Host side flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox this approach introduces double host side configuration: first time with rsyslog-config
service and second time with hostcfgd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if configuration does not change, hostcfgd will not restart rsyslog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox how do you plan to achieve that? The only reliable way which i see is to parse the config_db.json
(taking into consideration also init_cfg.json
) and fill daemon's internal cache on start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will parse /etc/rsyslog.conf for the first time, and comparing its configuration with CONFIG DB value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox what is the guarantee that rsyslog.conf
will be up-to-date before the hostcfgd
is started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check hostcfgd logic: https://github.com/sonic-net/sonic-host-services/blob/bc8698d1d760fefedaeb4742ad19b25ef2b3c17b/scripts/hostcfgd#L1661. Hostcfgd will delay itself by waiting for other system services. Thus, rsyslog-config.service and rsyslog.service should be started before hostcfgd,
|
||
![host_config_on_fly](/doc/syslog/images/host_rate_limit_config_flow.svg). | ||
|
||
#### Container side flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox this approach introduces double container side configuration: first time with container preStartAction
and second time with containercfgd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if configuration does not change, hostcfgd will not restart rsyslog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox how do you plan to achieve that? The only reliable way which i see is to parse the config_db.json
(taking into consideration also init_cfg.json
) and fill daemon's internal cache on start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will parse /etc/rsyslog.conf for the first time, and comparing its configuration with CONFIG DB value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox makes sense
|
||
#### CLI change | ||
|
||
Config rate limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox why not to have like this:
config
|--- syslog
|--- rate-limit
|--- host --interval <interval> --burst <burst>
|--- container <service_name> --interval <interval> --burst <burst>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. But I don't see an obviously benefit comparing to the current design. As the code has been implemented for a long time, I would like to keep it as rate-limit-host
and rate-limit-container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox ok. Got it.
|
||
> Note: set interval or burst to 0 will disable rate limit. | ||
|
||
Show rate limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox why not to have like this:
show
|--- syslog
|--- rate-limit
|--- host
|--- container <service_name>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. But I don't see an obviously benefit comparing to the current design. As the code has been implemented for a long time, I would like to keep it as rate-limit-host
and rate-limit-container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Junchao-Mellanox ok. Got it.
Last call on HLD review, @tzack000 and @venkatmahalingam if you have further comments this is the time. Otherwise I will cont with merging it. |
@qiluo-msft can you please merge this PR? Thanks. |
@Junchao-Mellanox @liat-grozovik The document seems to have special characters and hence the page deployment workflow is failing. Can you check if the failure on page deployment workflow is not due to this merge ? |
Hi, what do you mean by "page deployment workflow"? where can I get the log? |
@StormLiangMS can you please back port this to 202211? Thanks. |
Overview:
Logging in SONiC is organized with rsyslogd. Each container has its own rsyslogd instance plus a daemon running on a host. The rsyslogd instance which is running on the host is used to collect the messages from within containers and store them at /var/log/syslog path. Rsyslog config file are generated from templates:
Currently, each container has hardcoded message rate limiting to avoid receiving flooded log messages:
There is no rate limiting configured on host side for now.
The SystemLogRateLimitInterval determines the amount of time that is being measured for rate limiting. The SystemLogRateLimitBurst defines the amount of messages, that have to occur in the time limit of SystemLogRateLimitInterval, to trigger rate limiting. For example, SystemLogRateLimitInterval=300, SystemLogRateLimitBurst=20000, it means that if one daemon generate more than 20000 messages in 300 seconds, rsyslogd will start to drop messages after that(FIFO).
This feature allows user to configure SystemLogRateLimitInterval and SystemLogRateLimitBurst for host, containers.
Related PR: