-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TACACS server monitor design document. #1467
base: master
Are you sure you want to change the base?
Conversation
Thanks for addressing the issue.
|
cc: @shdasari The idea of adjusting the priority based on the responsiveness of the server is good. As mentioned, it should alleviate issue #1462. However, I feel adjusting the configuration based on a temporary network event might not be the best approach. How about adding some state information to the STATE_DB (COUNTER_DB, or appropriate DB)? Adjust the tacplus pam configuration to account for temporarily unreachable server. Once the network event is detected to be resolved, then the temporary adjustment could be backed out, with suitable logs to advise the admin of the actions taken. |
@a-barboza , currently if SONiC TACACS can connect to first server, it will not try connecting other low priority servers, which means can't detect low priority server network status. and if we want get connection status of low priority server, we need create a new daemon running in background to periodically check server. Also in this design, the monitor change server priority based on a time window and event count threshold, for example more than 50% connection failed within 5 minutes. this is a simple solution can handle almost every scenario. So, how about I update the design doc to:
|
@ycoheNvidia @a-barboza Would you like to review again? Do you have more comments? |
@a-barboza , I update design doc according to your suggestion, please give your comments. |
``` | ||
|
||
### Config DB schema | ||
#### TACPLUS_MONITOR Table schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can TACPLUS_MONITOR be disabled? For example, if the table is not defined, does it mean that TACPLUS Monitor is disabled, i.e. no monitoring, and effective priorities are same as configured priorities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 'enable' flag for disable this feature. when feature disabled, configured priorities will same as configured priorities.
How does a user get notified that the order has been changed? |
When Monitor found latency or unreachable issue, warning message will write to syslog. in prod environment there need some other service check SONiC device health by syslog and send alert to user, which is not part of SONiC. For timeout issue, if all 8 servers not reachable, Monit will handle it by send alert event for the timeout. |
@lguohan , could you review and signoff this PR? |
doc/aaa/TACACS+ Server Monitor.md
Outdated
- When hostcfgd generate TACACS config file, server priority calculated according to following rules: | ||
- Get server priority info from CONFIG_DB TACPLUS_SERVER table. | ||
- Change high latency server to 1, this is because 1 is the smallest priority, and SONiC device will use high priority server first. | ||
- Un-reachable server will not include in TACACS config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if un-reachable server is excluded, later if the server becomes reachable, how can we include it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, hostcfgd will add server back when found an unreachable server become reachable.
config_key = 'config' ; The configuration key | ||
; Attributes | ||
time_window = 1*5DIGIT ; Monitor time window in minute, default is 5 | ||
high_latency_threshold = 1*5DIGIT ; High latency threshold in ms, default is 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing yang mode design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, yang model added.
doc/aaa/TACACS+ Server Monitor.md
Outdated
|
||
### Functional Requirement | ||
- Monit TACACS+ server unreachable event from COUNTER_DB. | ||
- Monit TACACS+ server slow response event from COUNTER_DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which component write to the counter_db, it is not clear from the design doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monit service will write COUNTER_DB, add detail to design doc.
doc/aaa/TACACS+ Server Monitor.md
Outdated
- Hostcfgd will monitor TACPLUS_SERVER_LATENCY table, and will re-generate TACACS config file when following event happen: | ||
- Any server latency is -1, which means the server is unreachable. | ||
- Any server latency is bigger than high_latency_threshold. | ||
- When hostcfgd generate TACACS config file, server priority calculated according to following rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need an option to maintain backward compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, add 'enable' flag, this feature can be disable by this flag.
doc/aaa/TACACS+ Server Monitor.md
Outdated
- TACACS+ monitor also will write warning message to syslog when following event happen: | ||
- Any server latency is -1, which means the server is unreachable. | ||
- Any server latency is bigger than high_latency_threshold. | ||
- Hostcfgd will monitor TACPLUS_SERVER_LATENCY table, and will re-generate TACACS config file when following event happen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the threshold to determine high latency v.s. not. how do we choose the threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold is 20ms. which is based on experience when handle TACACS server latency/unreachable issue, I can share more detail in review meeting, but may not necessarily to write that in public doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented.
; Key | ||
config_key = 'config' ; The configuration key | ||
; Attributes | ||
enable = BOOLEAN ; Enable Monitor feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add TACACS server monitor design document.
The TACACS server monitor can change TACACS server priority based on syslog and can resolve following issue:
#1462