-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warm reboot: Support vlanmgrd process warm restart #550
Merged
lguohan
merged 7 commits into
sonic-net:master
from
jipanyang:warm_reboot_collab_3_vlanmgrd
Aug 16, 2018
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
892c5b8
Support vlanmgrd process warm restart
jipanyang c146887
[VS]: add test case for vlanmgrd warm restart
jipanyang 9fe9242
Adapt to the new warm reboot schema
jipanyang ac0e4fd
Update warm_restart common functions
jipanyang f74ccac
warm_restart common functions already available, remove them from thi…
jipanyang 9b83c1e
Use fixed CFG_WARM_RESTART_TABLE_NAME and STATE_WARM_RESTART_TABLE_NA…
jipanyang faf1130
Remove hardcoded names for warm restart config table and state table
jipanyang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
from swsscommon import swsscommon | ||
import os | ||
import re | ||
import time | ||
import json | ||
|
||
# Get restart count of all processes supporting warm restart | ||
def swss_get_RestartCount(state_db): | ||
restart_count = {} | ||
warmtbl = swsscommon.Table(state_db, swsscommon.STATE_WARM_RESTART_TABLE_NAME) | ||
keys = warmtbl.getKeys() | ||
assert len(keys) != 0 | ||
for key in keys: | ||
(status, fvs) = warmtbl.get(key) | ||
assert status == True | ||
for fv in fvs: | ||
if fv[0] == "restart_count": | ||
restart_count[key] = int(fv[1]) | ||
print(restart_count) | ||
return restart_count | ||
|
||
# function to check the restart count incremented by 1 for all processes supporting warm restart | ||
def swss_check_RestartCount(state_db, restart_count): | ||
warmtbl = swsscommon.Table(state_db, swsscommon.STATE_WARM_RESTART_TABLE_NAME) | ||
keys = warmtbl.getKeys() | ||
print(keys) | ||
assert len(keys) > 0 | ||
for key in keys: | ||
(status, fvs) = warmtbl.get(key) | ||
assert status == True | ||
for fv in fvs: | ||
if fv[0] == "restart_count": | ||
assert int(fv[1]) == restart_count[key] + 1 | ||
elif fv[0] == "state": | ||
assert fv[1] == "reconciled" | ||
|
||
def check_port_oper_status(appl_db, port_name, state): | ||
portTbl = swsscommon.Table(appl_db, swsscommon.APP_PORT_TABLE_NAME) | ||
(status, fvs) = portTbl.get(port_name) | ||
assert status == True | ||
|
||
oper_status = "unknown" | ||
for v in fvs: | ||
if v[0] == "oper_status": | ||
oper_status = v[1] | ||
break | ||
assert oper_status == state | ||
|
||
# function to check the restart count incremented by 1 for a single process | ||
def swss_app_check_RestartCount_single(state_db, restart_count, name): | ||
warmtbl = swsscommon.Table(state_db, swsscommon.STATE_WARM_RESTART_TABLE_NAME) | ||
keys = warmtbl.getKeys() | ||
print(keys) | ||
print(restart_count) | ||
assert len(keys) > 0 | ||
for key in keys: | ||
if key != name: | ||
continue | ||
(status, fvs) = warmtbl.get(key) | ||
assert status == True | ||
for fv in fvs: | ||
if fv[0] == "restart_count": | ||
assert int(fv[1]) == restart_count[key] + 1 | ||
elif fv[0] == "state": | ||
assert fv[1] == "reconciled" | ||
def create_entry(tbl, key, pairs): | ||
fvs = swsscommon.FieldValuePairs(pairs) | ||
tbl.set(key, fvs) | ||
|
||
# FIXME: better to wait until DB create them | ||
time.sleep(1) | ||
|
||
def create_entry_tbl(db, table, key, pairs): | ||
tbl = swsscommon.Table(db, table) | ||
create_entry(tbl, key, pairs) | ||
|
||
def del_entry_tbl(db, table, key): | ||
tbl = swsscommon.Table(db, table) | ||
tbl._del(key) | ||
|
||
def create_entry_pst(db, table, key, pairs): | ||
tbl = swsscommon.ProducerStateTable(db, table) | ||
create_entry(tbl, key, pairs) | ||
|
||
def how_many_entries_exist(db, table): | ||
tbl = swsscommon.Table(db, table) | ||
return len(tbl.getKeys()) | ||
|
||
|
||
def test_VlanMgrdWarmRestart(dvs): | ||
|
||
conf_db = swsscommon.DBConnector(swsscommon.CONFIG_DB, dvs.redis_sock, 0) | ||
appl_db = swsscommon.DBConnector(swsscommon.APPL_DB, dvs.redis_sock, 0) | ||
state_db = swsscommon.DBConnector(swsscommon.STATE_DB, dvs.redis_sock, 0) | ||
|
||
dvs.runcmd("ifconfig Ethernet16 0") | ||
dvs.runcmd("ifconfig Ethernet20 0") | ||
|
||
dvs.runcmd("ifconfig Ethernet16 up") | ||
dvs.runcmd("ifconfig Ethernet20 up") | ||
|
||
time.sleep(1) | ||
|
||
# enable warm restart | ||
# TODO: use cfg command to config it | ||
create_entry_tbl( | ||
conf_db, | ||
swsscommon.CFG_WARM_RESTART_TABLE_NAME, "swss", | ||
[ | ||
("enable", "true"), | ||
] | ||
) | ||
|
||
# create vlan | ||
create_entry_tbl( | ||
conf_db, | ||
"VLAN", "Vlan16", | ||
[ | ||
("vlanid", "16"), | ||
] | ||
) | ||
# create vlan | ||
create_entry_tbl( | ||
conf_db, | ||
"VLAN", "Vlan20", | ||
[ | ||
("vlanid", "20"), | ||
] | ||
) | ||
# create vlan member entry in config db. Don't use Ethernet0/4/8/12 as IP configured on them in previous testing. | ||
create_entry_tbl( | ||
conf_db, | ||
"VLAN_MEMBER", "Vlan16|Ethernet16", | ||
[ | ||
("tagging_mode", "untagged"), | ||
] | ||
) | ||
|
||
create_entry_tbl( | ||
conf_db, | ||
"VLAN_MEMBER", "Vlan20|Ethernet20", | ||
[ | ||
("tagging_mode", "untagged"), | ||
] | ||
) | ||
|
||
time.sleep(1) | ||
|
||
dvs.runcmd("ifconfig Vlan16 11.0.0.1/29 up") | ||
dvs.runcmd("ifconfig Vlan20 11.0.0.9/29 up") | ||
|
||
dvs.servers[4].runcmd("ifconfig eth0 11.0.0.2/29") | ||
dvs.servers[4].runcmd("ip route add default via 11.0.0.1") | ||
|
||
dvs.servers[5].runcmd("ifconfig eth0 11.0.0.10/29") | ||
dvs.servers[5].runcmd("ip route add default via 11.0.0.9") | ||
|
||
time.sleep(1) | ||
|
||
# Ping should work between servers via vs vlan interfaces | ||
ping_stats = dvs.servers[4].runcmd("ping -c 1 11.0.0.10") | ||
time.sleep(1) | ||
|
||
tbl = swsscommon.Table(appl_db, "NEIGH_TABLE") | ||
(status, fvs) = tbl.get("Vlan16:11.0.0.2") | ||
assert status == True | ||
|
||
(status, fvs) = tbl.get("Vlan20:11.0.0.10") | ||
assert status == True | ||
|
||
|
||
bv_before = dvs.runcmd("bridge vlan") | ||
print(bv_before) | ||
|
||
restart_count = swss_get_RestartCount(state_db) | ||
|
||
dvs.runcmd(['sh', '-c', 'pkill -x vlanmgrd; cp /var/log/swss/sairedis.rec /var/log/swss/sairedis.rec.b; echo > /var/log/swss/sairedis.rec']) | ||
dvs.runcmd(['sh', '-c', 'supervisorctl start vlanmgrd']) | ||
time.sleep(2) | ||
|
||
bv_after = dvs.runcmd("bridge vlan") | ||
assert bv_after == bv_before | ||
|
||
# No create/set/remove operations should be passed down to syncd for vlanmgr warm restart | ||
num = dvs.runcmd(['sh', '-c', 'grep \|c\| /var/log/swss/sairedis.rec | wc -l']) | ||
assert num == '0\n' | ||
num = dvs.runcmd(['sh', '-c', 'grep \|s\| /var/log/swss/sairedis.rec | wc -l']) | ||
assert num == '0\n' | ||
num = dvs.runcmd(['sh', '-c', 'grep \|r\| /var/log/swss/sairedis.rec | wc -l']) | ||
assert num == '0\n' | ||
|
||
#new ip on server 5 | ||
dvs.servers[5].runcmd("ifconfig eth0 11.0.0.11/29") | ||
|
||
# Ping should work between servers via vs vlan interfaces | ||
ping_stats = dvs.servers[4].runcmd("ping -c 1 11.0.0.11") | ||
|
||
# new neighbor learn on VS | ||
(status, fvs) = tbl.get("Vlan20:11.0.0.11") | ||
assert status == True | ||
|
||
swss_app_check_RestartCount_single(state_db, restart_count, "vlanmgrd") |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if this approach is bullet proof.
what if we later change vlan_filtering option, or enable more option for the bridge. it could happen that older version does not have that option, but new vlanmgrd will enable that option, but the warm reboot will miss it.
I think the right approach should still remove all of them and add new, this is mainly control plane, then we can still achieve non data plane disruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of vlan doesn't affect data plane directly, but BGP docker and BGP will be affected and cause route flapping.
If there is vlan_filtering option change though unlikely for now, probably the easier way is to handle that explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not particularly worry about vlan_filtering, I am more worry about future bridge attribute, maybe disable unknown multicast, unknown unicast options.
for bgp docker, I think we can do docker pause.
https://docs.docker.com/engine/reference/commandline/pause/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete/create of bridge and vlan is done in linux kernel, and zebra listen on that, pausing docker in this case may trigger unknown side effect since we don't know the exact timing of netlink message. It also makes interface handling more complex.
For disabling unknown multicast, unknown unicast, probably we should add configuration option for them, that was in the original vlan trunk pull request. We may bring it back and refine the change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that in the future you never know what you are going to add for the bridge. Therefore, we should create exactly the same one as we create in cold boot.
To ensure that, the cleanest approach is to remove and recreate, then we can share the same code path as the cold boot.
if this can cause control plane disruption, we should then stop the bgp container and do the bgp gr.
In reply to: 205924420 [](ancestors = 205924420)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides system level warm reboot, we want to support docker warm restart. We intend to have same code path for cold boot and warm boot whenever possible.
For this case, it is in constructor phase. I don't see why new option has to be put here instead of as a configuration option. Also we use docker to separate the services, stopping other docker when doing operation in one docker doesn't seem that clean.