Skip to content

Commit

Permalink
[ceos]: check ceos testbed health via snmp and restart ceos if necess…
Browse files Browse the repository at this point in the history
…ary (#2855)

there are quite a few testbed failure causing kvm test fails. I login into those ceos devices and found management ip disappear although the configuration is good. restart the ceos container make the nbr healthy. as a mitigation efforts, this pr check the nbr health via snmp and restart the ceos testbed if necessary.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
  • Loading branch information
lguohan authored Jan 26, 2021
1 parent db247dd commit 1e12790
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 94 deletions.
14 changes: 13 additions & 1 deletion ansible/roles/eos/tasks/ceos.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
- snmp_facts: host={{ ansible_host }} version=v2c is_eos=true community={{ snmp_rocommunity }}
delegate_to: localhost
register: snmp_data
ignore_errors: true

- name: set force_restart=yes for ceos container
set_fact: force_restart=yes

- name: set farce_restart=no for ceos container
set_fact: force_restart=no
when: snmp_data.ansible_facts.ansible_sysname is defined

- include_tasks: ceos_config.yml

- name: Create cEOS container ceos_{{ vm_set_name }}_{{ inventory_hostname }}
Expand All @@ -8,7 +20,7 @@
command: /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 systemd.setenv=CEOS=1 systemd.setenv=EOS_PLATFORM=ceoslab systemd.setenv=container=docker systemd.setenv=MGMT_INTF=eth0
pull: no
state: started
restart: yes
restart: "{{ force_restart }}"
tty: yes
network_mode: container:net_{{ vm_set_name }}_{{ inventory_hostname }}
detach: True
Expand Down
1 change: 1 addition & 0 deletions ansible/roles/vm_set/tasks/add_ceos_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
docker_image_info:
name:
- "{{ ceos_image_orig }}"
- "{{ ceos_image }}"
become: yes
register: ceos_stat

Expand Down
5 changes: 3 additions & 2 deletions ansible/testbed-cli.sh
Original file line number Diff line number Diff line change
Expand Up @@ -308,10 +308,11 @@ function refresh_dut
ansible_options="-e sonic_vm_storage_location=$sonic_vm_dir"
fi

ANSIBLE_SCP_IF_SSH=y ansible-playbook -vvv -i $vmfile testbed_refresh_dut.yml --vault-password-file="${passwd}" -l "$server" \
ANSIBLE_SCP_IF_SSH=y ansible-playbook -i $vmfile testbed_add_vm_topology.yml --vault-password-file="${passwd}" -l "$server" \
-e topo_name="$topo_name" -e duts_name="$duts" -e VM_base="$vm_base" \
-e ptf_ip="$ptf_ip" -e topo="$topo" -e vm_set_name="$vm_set_name" \
-e ptf_imagename="$ptf_imagename" -e ptf_ipv6="$ptf_ipv6" \
-e ptf_imagename="$ptf_imagename" -e vm_type="$vm_type" -e ptf_ipv6="$ptf_ipv6" \
-e force_stop_sonic_vm="yes" \
$ansible_options $@

echo Done
Expand Down
1 change: 1 addition & 0 deletions ansible/testbed_add_vm_topology.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
when: duts_name.split(',')|length > 1

roles:
- { role: vm_set, action: 'stop_sonic_vm', when force_stop_sonic_vm is defined }
- { role: vm_set, action: 'start_sonic_vm' }
- { role: vm_set, action: 'start_sid' }
- { role: vm_set, action: 'add_topo' }
Expand Down
90 changes: 0 additions & 90 deletions ansible/testbed_refresh_dut.yml

This file was deleted.

2 changes: 1 addition & 1 deletion tests/kvmtest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ fi
pushd $SONIC_MGMT_DIR/ansible
if [ -n "$refresh_dut" ]; then
# Refresh dut in the virtual switch topology
./testbed-cli.sh -m $inventory -t $testbed_file refresh-dut $tbname password.txt
./testbed-cli.sh -m $inventory -t $testbed_file -k ceos refresh-dut $tbname password.txt
sleep 120
fi

Expand Down

0 comments on commit 1e12790

Please sign in to comment.