Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm reboot: Add basic schema for warm start in configDB and stateDB. #553

Merged
merged 5 commits into from
Aug 8, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions doc/swss-schema.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
Schema data is defined in ABNF [RFC5234](https://tools.ietf.org/html/rfc5234) syntax.

### Definitions of common tokens
## Definitions of common tokens
name = 1*DIGIT/1*ALPHA
ref_hash_key_reference = "[" hash_key "]" ;The token is a refernce to another valid DB key.
hash_key = name ; a valid key name (i.e. exists in DB)


## Application DB schema

### PORT_TABLE
Stores information for physical switch ports managed by the switch chip. device_names are defined in [port_config.ini](../portsyncd/port_config.ini). Ports to the CPU (ie: management port) and logical ports (loopback) are not declared in the PORT_TABLE. See INTF_TABLE.

Expand Down Expand Up @@ -620,7 +622,44 @@ Equivalent RedisDB entry:
12) "0"
127.0.0.1:6379>

### Configuration files

## Configuration DB schema

### WARM\_RESTART
;Stores system warm start configuration
;Status: work in progress

key = WARM_RESTART:name ; name is the name of SONiC docker or "system" for global configuration.

enable = "true" / "false" ; Default value as false.
; If "system" warm start knob is true, docker level knob will be ignored.
; If "system" warm start knob is false, docker level knob takes effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker knob always override system level knob, that was the use cases we dicussed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky point is the default value is false. If docker knob always override system level knob, when warm restart is enabled at system level, each docker has to be enabled too.

What is the use case of having warm restart enabled at system level, while disabled for certain docker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if docker knob does not have this attribute, then use system level.

the use case was rodney's, snmp docker snmp warm reboot is available but experimental, so in production he wants to explicitly disable it although system level is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodnymolina for snmp case, probably it should alway do cold boot as other dockers like lldp, dhcp_relay until its warm reboot support is stable enough.

If the existence of certain attribute is treated as one condition, then it becomes tri-state variable: true/false/not-exist, to me, it seems error prone. We don't have such case in sonic yet(?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, more specific knobs take precedence over global configuration. I think we can use this principle here as well.
If we don't like the not-exist case, we need this knob for every docker. For the time-being, we can define the knobs for dockers which have code to support warm reboot, for the ones do not support warm reboot, they do not look at the knobs anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If docker knob always takes precedence over system knob, the not-exist case is asked implicitly, otherwise why system knob is needed if it gets preempted all the time. That means the default value for enable is not-exist.

The knobs are also used to control whether to flush DB.
Ex. based on the current schema, if system knob or one docker knob enabled, we keep the appDB and some other DBs data.

But if docker knob always takes precedence, and it is disabled, should the DB be flushed or not. If DB is not flushed, the specific docker will have to handle the scenarios of DB with existing data though it is supposed to do a cold start. If DB is flushed, how about other dockers?

Better not to support the case of system knob enabled, while docker knob disabled and takes effect. That is one of the reasons for If "system" warm start knob is true, docker level knob will be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just simplify it by putting it into DEVICE_METADATA:localhost

warm_restart@: none, system, or a list of individual docker names

for most of use cases, it is system or none. if a user/developer wants to customized, he will specify a list of docker names.

warm_restart_timer_bgp: time_duration
warm_restart_timer_neighbor: time_duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few un-documented configurations have been put under DEVICE_METADATA.
If the same set of warm restart configurations are used, squashing them into DEVICE_METADATA:localhost doesn't seem to bring any benefit other than to make the configuration more obfuscated. Also more parameters might be needed for warm restart.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jipanyang As talked, this "system" knob was meant for whole system warm reboot, not a global setting of docker restart, that would lead to a different discussion.

Could you please help to provide (the current) work flow of using these knobs during the shutdown and bringing up of the dockers/system? We can then think whether the whole work flow make sense and keeps everybody on the same page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @zhenggen-xu , a little more clarification about the use cases for system knob and docker knobs.
<1> Upon system reboot, the system enable knob will be checked. If enabled, database data will be preserved, if not, database will be flushed.
No need to check docker level knobs in this case since the whole system is being rebooted .

<2> Upon docker service start, first to check system knob, if enabled, docker warm start should be performed, otherwise system warm reboot will be ruined. If system knob disabled, while docker knob enabled, this is likely an individual docker warm restart request.

Within each application, when the system knob or docker knob enabled, we do further check on the actual warm start state ( restart_count), if no warm start state data available, the database has been flushed, do cold start. Otherwise warm start.

The exact logic has been implemented in WarmStart::checkWarmStart():
https://github.com/Azure/sonic-swss/pull/547/files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this logic.


timer_name = 1*255VCHAR, ; timer_name is the name of warm start timer for the whole system or the specific docker,
; Ex. "neighbor_timer", "bgp_timer". The name should not contain "," character.
; There could be more than one timer in a docker or the system, separated by ",".

timer_duration = 1*4DIGIT, ; timer duration, in seconds. Separated by "," if there are more than on timer.


## State DB schema

### WARM\_RESTART\_TABLE
;Stores application and orchdameon warm start status
;Status: work in progress

key = WARM_RESTART_TABLE:process_name ; process_name is a unique process identifier.
restart_count = 1*10DIGIT ; a number between 0 and 2147483647,
; count of warm start times.

state = "init" / "restored" / "reconciled" ; init: process init with warm start enabled.
; restored: process restored to the previous
; state using saved data.
; reconciled: process reconciled with up to date
; dynanic data like port state, neighbor, routes
; and so on.

## Configuration files
What configuration files should we have? Do apps, orch agent each need separate files?

[port_config.ini](https://github.com/stcheng/swss/blob/mock/portsyncd/port_config.ini) - defines physical port information
Expand Down