-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TACACS server monitor design document. #1467
base: master
Are you sure you want to change the base?
Changes from 4 commits
31c352c
e8904c2
d880f38
3be2092
972e520
99be3fa
9573e13
a584a50
4f7250a
f5f4f2d
f9e4285
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# TACACS+ server monitor design | ||
|
||
## Overview | ||
|
||
SONiC device usually configured with multiple TACACS+ server, when a server is unreachable, SONiC device will try to connect with next TACACS+ server. | ||
|
||
SONiC device will communicate with TACACS+ server in following scenarios: | ||
1. Remote user login to SONiC device. | ||
2. Remote user run commands on SONiC device. | ||
|
||
There is a timeout for each server, the default value is 5 seconds, this means if the first server not reachable, SONiC device will stuck there when user login or running commands. | ||
|
||
To improve this issue, SONiC will add a TACACS+ server monitor to change server priority, a server unreachable or slow response will be downgrade. | ||
|
||
### Functional Requirement | ||
- Monit TACACS+ server unreachable event from COUNTER_DB. | ||
- Monit TACACS+ server slow response event from COUNTER_DB. | ||
- Change server priority based unreachable event and slow response event. | ||
- Not change any other server attribute. | ||
- Not change any other TACACS+ config. | ||
|
||
### Counter DB schema | ||
#### TACPLUS_SERVER_LATENCY Table schema | ||
``` | ||
; Key | ||
server_key = IPAddress ; TACACS+ server’s address | ||
; Attributes | ||
latency = 1*10DIGIT ; server network latency in MS, -1 for connect to server timeout | ||
``` | ||
|
||
### Config DB schema | ||
#### TACPLUS_MONITOR Table schema | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can TACPLUS_MONITOR be disabled? For example, if the table is not defined, does it mean that TACPLUS Monitor is disabled, i.e. no monitoring, and effective priorities are same as configured priorities? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added 'enable' flag for disable this feature. when feature disabled, configured priorities will same as configured priorities. |
||
``` | ||
; Key | ||
config_key = 'config' ; The configuration key | ||
; Attributes | ||
time_window = 1*5DIGIT ; Monitor time window in minute, default is 5 | ||
high_latency_threshold = 1*5DIGIT ; High latency threshold in ms, default is 20 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing yang mode design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, yang model added. |
||
``` | ||
|
||
# 3 Limitation | ||
|
||
- Service priority change will have 1 minutes delay, this is because monit service will run profile every 1 minutes. | ||
liuh-80 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# 4 Design | ||
|
||
``` | ||
+------------+ | ||
| Monit | | ||
+-----+------+ | ||
| | ||
+------------v--------------+ +---------------------+ | ||
| | | | | ||
| | | | | ||
| TACACS+ Monitor |------>| COUNTER_DB | | ||
| | | | | ||
| | | | | ||
+------------+--------------+ +--------------------- | ||
| | ||
+---------v---------+ +-------+--------+ | ||
| | | | | ||
| TACACS config file+---------------> config file | | ||
| generate script | | | | ||
+-------------------+ +-------+--------+ | ||
|
||
``` | ||
- TACACS+ monitor is a Monit profile. | ||
- TACACS+ monitor will perdically check TACACS server latency and update latency to COUNTER_DB. | ||
- The latency in COUNTER_DB TACPLUS_SERVER_LATENCY table is average latency in recent time window. | ||
- The time window side defined in CONFIG_DB TACPLUS_MONITOR table. | ||
- TACACS+ monitor also will write warning message to syslog when following event happen: | ||
- Any server latency is -1, which means the server is unreachable. | ||
- Any server latency is bigger than high_latency_threshold. | ||
- Hostcfgd will monitor TACPLUS_SERVER_LATENCY table, and will re-generate TACACS config file when following event happen: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the threshold to determine high latency v.s. not. how do we choose the threshold. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The threshold is 20ms. which is based on experience when handle TACACS server latency/unreachable issue, I can share more detail in review meeting, but may not necessarily to write that in public doc. |
||
- Any server latency is -1, which means the server is unreachable. | ||
- Any server latency is bigger than high_latency_threshold. | ||
- When hostcfgd generate TACACS config file, server priority calculated according to following rules: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need an option to maintain backward compatibility There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, add 'enable' flag, this feature can be disable by this flag. |
||
- Get server priority info from CONFIG_DB TACPLUS_SERVER table. | ||
- Change high latency server to 1, this is because 1 is the smallest priority, and SONiC device will use high priority server first. | ||
- Un-reachable server will not include in TACACS config file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if un-reachable server is excluded, later if the server becomes reachable, how can we include it back? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, hostcfgd will add server back when found an unreachable server become reachable. |
||
- If other server also has priority 1 in CONFIG_DB, change priority to 2 | ||
- If other server priority is no 1, using original priority in CONFIG_DB | ||
|
||
# 5 References | ||
|
||
## TACACS+ Authentication | ||
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Authentication.md | ||
## SONiC TACACS+ improvement | ||
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Design.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which component write to the counter_db, it is not clear from the design doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monit service will write COUNTER_DB, add detail to design doc.