-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warm reboot: Add basic schema for warm start in configDB and stateDB. #553
Changes from all commits
be4391b
b6f7a64
822d25c
416bc27
3720444
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,13 @@ | ||
Schema data is defined in ABNF [RFC5234](https://tools.ietf.org/html/rfc5234) syntax. | ||
|
||
### Definitions of common tokens | ||
## Definitions of common tokens | ||
name = 1*DIGIT/1*ALPHA | ||
ref_hash_key_reference = "[" hash_key "]" ;The token is a refernce to another valid DB key. | ||
hash_key = name ; a valid key name (i.e. exists in DB) | ||
|
||
|
||
## Application DB schema | ||
|
||
### PORT_TABLE | ||
Stores information for physical switch ports managed by the switch chip. device_names are defined in [port_config.ini](../portsyncd/port_config.ini). Ports to the CPU (ie: management port) and logical ports (loopback) are not declared in the PORT_TABLE. See INTF_TABLE. | ||
|
||
|
@@ -620,7 +622,44 @@ Equivalent RedisDB entry: | |
12) "0" | ||
127.0.0.1:6379> | ||
|
||
### Configuration files | ||
|
||
## Configuration DB schema | ||
|
||
### WARM\_RESTART | ||
;Stores system warm start configuration | ||
;Status: work in progress | ||
|
||
key = WARM_RESTART:name ; name is the name of SONiC docker or "system" for global configuration. | ||
|
||
enable = "true" / "false" ; Default value as false. | ||
; If "system" warm start knob is true, docker level knob will be ignored. | ||
; If "system" warm start knob is false, docker level knob takes effect. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After discussion with @zhenggen-xu , a little more clarification about the use cases for system knob and docker knobs. <2> Upon docker service start, first to check system knob, if enabled, docker warm start should be performed, otherwise system warm reboot will be ruined. If system knob disabled, while docker knob enabled, this is likely an individual docker warm restart request. Within each application, when the system knob or docker knob enabled, we do further check on the actual warm start state ( restart_count), if no warm start state data available, the database has been flushed, do cold start. Otherwise warm start. The exact logic has been implemented in WarmStart::checkWarmStart(): There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with this logic. |
||
|
||
timer_name = 1*255VCHAR, ; timer_name is the name of warm start timer for the whole system or the specific docker, | ||
; Ex. "neighbor_timer", "bgp_timer". The name should not contain "," character. | ||
; There could be more than one timer in a docker or the system, separated by ",". | ||
|
||
timer_duration = 1*4DIGIT, ; timer duration, in seconds. Separated by "," if there are more than on timer. | ||
|
||
|
||
## State DB schema | ||
|
||
### WARM\_RESTART\_TABLE | ||
;Stores application and orchdameon warm start status | ||
;Status: work in progress | ||
|
||
key = WARM_RESTART_TABLE:process_name ; process_name is a unique process identifier. | ||
restart_count = 1*10DIGIT ; a number between 0 and 2147483647, | ||
; count of warm start times. | ||
|
||
state = "init" / "restored" / "reconciled" ; init: process init with warm start enabled. | ||
; restored: process restored to the previous | ||
; state using saved data. | ||
; reconciled: process reconciled with up to date | ||
; dynanic data like port state, neighbor, routes | ||
; and so on. | ||
|
||
## Configuration files | ||
What configuration files should we have? Do apps, orch agent each need separate files? | ||
|
||
[port_config.ini](https://github.com/stcheng/swss/blob/mock/portsyncd/port_config.ini) - defines physical port information | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker knob always override system level knob, that was the use cases we dicussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky point is the default value is false. If docker knob always override system level knob, when warm restart is enabled at system level, each docker has to be enabled too.
What is the use case of having warm restart enabled at system level, while disabled for certain docker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if docker knob does not have this attribute, then use system level.
the use case was rodney's, snmp docker snmp warm reboot is available but experimental, so in production he wants to explicitly disable it although system level is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodnymolina for snmp case, probably it should alway do cold boot as other dockers like lldp, dhcp_relay until its warm reboot support is stable enough.
If the existence of certain attribute is treated as one condition, then it becomes tri-state variable: true/false/not-exist, to me, it seems error prone. We don't have such case in sonic yet(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, more specific knobs take precedence over global configuration. I think we can use this principle here as well.
If we don't like the not-exist case, we need this knob for every docker. For the time-being, we can define the knobs for dockers which have code to support warm reboot, for the ones do not support warm reboot, they do not look at the knobs anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If docker knob always takes precedence over system knob, the not-exist case is asked implicitly, otherwise why system knob is needed if it gets preempted all the time. That means the default value for enable is not-exist.
The knobs are also used to control whether to flush DB.
Ex. based on the current schema, if system knob or one docker knob enabled, we keep the appDB and some other DBs data.
But if docker knob always takes precedence, and it is disabled, should the DB be flushed or not. If DB is not flushed, the specific docker will have to handle the scenarios of DB with existing data though it is supposed to do a cold start. If DB is flushed, how about other dockers?
Better not to support the case of system knob enabled, while docker knob disabled and takes effect. That is one of the reasons for
If "system" warm start knob is true, docker level knob will be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just simplify it by putting it into DEVICE_METADATA:localhost
warm_restart@: none, system, or a list of individual docker names
for most of use cases, it is system or none. if a user/developer wants to customized, he will specify a list of docker names.
warm_restart_timer_bgp: time_duration
warm_restart_timer_neighbor: time_duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few un-documented configurations have been put under DEVICE_METADATA.
If the same set of warm restart configurations are used, squashing them into DEVICE_METADATA:localhost doesn't seem to bring any benefit other than to make the configuration more obfuscated. Also more parameters might be needed for warm restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jipanyang As talked, this "system" knob was meant for whole system warm reboot, not a global setting of docker restart, that would lead to a different discussion.
Could you please help to provide (the current) work flow of using these knobs during the shutdown and bringing up of the dockers/system? We can then think whether the whole work flow make sense and keeps everybody on the same page.