Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. #10500

Merged
merged 19 commits into from
May 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion device/nokia/x86_64-nokia_ixr7250e_sup-r0/chassisdb.conf
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@ start_chassis_db=1
chassis_db_address=10.6.0.100
lag_id_start=1
lag_id_end=512
midplane_subnet=10.6.0.0/16
8 changes: 6 additions & 2 deletions dockers/docker-database/flush_unused_database
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import swsssdk
import redis
import subprocess
import time
import syslog

while(True):
output = subprocess.Popen(['sonic-db-cli', 'PING'], stdout=subprocess.PIPE, text=True).communicate()[0]
Expand All @@ -24,5 +25,8 @@ for instname, v in instlists.items():
if dbinst == instname:
continue

r = redis.Redis(host=insthost, unix_socket_path=instsocket, db=dbid)
r.flushdb()
try:
r = redis.Redis(host=insthost, unix_socket_path=instsocket, db=dbid)
r.flushdb()
except (redis.exceptions.ConnectionError):
syslog.syslog(syslog.LOG_INFO,"flushdb:Redis Unix Socket connection error for path {} and dbaname {}".format(instsocket, dbname))
42 changes: 34 additions & 8 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,8 @@ function preStartAction()

function setPlatformLagIdBoundaries()
{
CHASSIS_CONF=/usr/share/sonic/device/$PLATFORM/chassisdb.conf
if [ -f "$CHASSIS_CONF" ]; then
source $CHASSIS_CONF
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_START" "$lag_id_start"
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_END" "$lag_id_end"
fi
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_START" "$lag_id_start"
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_END" "$lag_id_end"
}
function waitForAllInstanceDatabaseConfigJsonFilesReady()
{
Expand Down Expand Up @@ -158,13 +154,40 @@ sleep 1
function postStartAction()
{
{%- if docker_container_name == "database" %}
CHASSISDB_CONF="/usr/share/sonic/device/$PLATFORM/chassisdb.conf"
[ -f $CHASSISDB_CONF ] && source $CHASSISDB_CONF
if [ "$DEV" ]; then
# Enable the forwarding on eth0 interface in namespace.
SYSCTL_NET_CONFIG="/etc/sysctl.d/sysctl-net.conf"
docker exec -i database$DEV sed -i -e "s/^net.ipv4.conf.eth0.forwarding=0/net.ipv4.conf.eth0.forwarding=1/;
s/^net.ipv6.conf.eth0.forwarding=0/net.ipv6.conf.eth0.forwarding=1/" $SYSCTL_NET_CONFIG
docker exec -i database$DEV sysctl --system -e
link_namespace $DEV


if [[ -n "$midplane_subnet" ]]; then
judyjoseph marked this conversation as resolved.
Show resolved Hide resolved
# Use /16 for loopback interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see code to cleanup all these linux networking state when database docker restart?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ip netns delete should remove all the resources.

ip netns exec "$NET_NS" ip addr add 127.0.0.1/16 dev lo
ip netns exec "$NET_NS" ip addr del 127.0.0.1/8 dev lo

# Create eth1 in database instance
ip link add name ns-eth1"$NET_NS" link eth1-midplane type macvlan mode bridge
ip link set dev ns-eth1"$NET_NS" netns "$NET_NS"
ip netns exec "$NET_NS" ip link set ns-eth1"$NET_NS" name eth1

# Configure IP address and enable eth1
lc_slot_id=$(python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); print(platform_chassis.get_my_slot())' 2>/dev/null)
lc_ip_address=`echo $midplane_subnet | awk -F. '{print $1 "." $2}'`.$lc_slot_id.$(($DEV + 10))
lc_subnet_mask=${midplane_subnet#*/}
ip netns exec "$NET_NS" ip addr add $lc_ip_address/$lc_subnet_mask dev eth1
ip netns exec "$NET_NS" ip link set dev eth1 up

# Allow localnet routing on the new interfaces if midplane is using a
# subnet in the 127/8 range.
if [[ "${midplane_subnet#127}" != "$midplane_subnet" ]]; then
ip netns exec "$NET_NS" bash -c "echo 1 > /proc/sys/net/ipv4/conf/eth1/route_localnet"
fi
fi
fi
# Setup ebtables configuration
ebtables_config
Expand All @@ -180,7 +203,8 @@ function postStartAction()
# then we catch python exception of file not valid
# that comes to syslog which is unwanted so wait till database
# config is ready and then ping
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) ]]; do
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) &&
($(docker exec -i database$DEV sonic-db-cli PING | grep -c PONG) -gt 0) ]]; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't line 178 also doing SONIC_DB_CLI PING and line 179 also doing sonic-db-cli PING using docker exec? Whats the difference here?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are in namespace context SONIC_DB_CLI maps as sonic-db-cli -n asicx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you have an issue in the supervisor's database containers here, since midplane_subnet is not defined the midplane interface doesn't exist in the container, so trying to PING the databases from within the container will fail?

sleep 1;
done
if [[ ("$BOOT_TYPE" == "warm" || "$BOOT_TYPE" == "fastfast") && -f $WARM_DIR/dump.rdb ]]; then
Expand Down Expand Up @@ -222,7 +246,9 @@ function postStartAction()
($(docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB PING | grep -c True) -gt 0) ]]; do
sleep 1
done
setPlatformLagIdBoundaries
if [[ -n "$lag_id_start" && -n "$lag_id_end" ]]; then
setPlatformLagIdBoundaries
fi
REDIS_SOCK="/var/run/redis-chassis/redis_chassis.sock"
fi
chgrp -f redis $REDIS_SOCK && chmod -f 0760 $REDIS_SOCK
Expand Down
42 changes: 0 additions & 42 deletions files/image_config/interfaces/interfaces-config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,48 +60,6 @@ for intf_pid in $(ls -1 /var/run/dhclient*.Ethernet*.pid 2> /dev/null); do
[[ -f ${intf_pid} ]] && kill `cat ${intf_pid}` && rm -f ${intf_pid}
done


# Setup eth1 if we connect to a remote chassis DB.
PLATFORM=${PLATFORM:-`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`}
CHASSISDB_CONF="/usr/share/sonic/device/$PLATFORM/chassisdb.conf"
[[ -f $CHASSISDB_CONF ]] && source $CHASSISDB_CONF

ASIC_CONF="/usr/share/sonic/device/$PLATFORM/asic.conf"
[[ -f $ASIC_CONF ]] && source $ASIC_CONF

if [[ -n "$midplane_subnet" && ($NUM_ASIC -gt 1) ]]; then
for asic_id in `seq 0 $((NUM_ASIC - 1))`; do
NET_NS="asic$asic_id"

PIDS=`ip netns pids "$NET_NS" 2>/dev/null`
if [[ "$?" -ne "0" ]]; then # namespace doesn't exist
continue
fi

# Use /16 for loopback interface
ip netns exec $NET_NS ip addr add 127.0.0.1/16 dev lo
ip netns exec $NET_NS ip addr del 127.0.0.1/8 dev lo

# Create eth1 in database instance
ip link add name ns-eth1 link eth1-midplane type ipvlan mode l2
ip link set dev ns-eth1 netns $NET_NS
ip netns exec $NET_NS ip link set ns-eth1 name eth1

# Configure IP address and enable eth1
lc_slot_id=$(python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); print(platform_chassis.get_my_slot())' 2>/dev/null)
lc_ip_address=`echo $midplane_subnet | awk -F. '{print $1 "." $2}'`.$lc_slot_id.$((asic_id + 10))
lc_subnet_mask=${midplane_subnet#*/}
ip netns exec $NET_NS ip addr add $lc_ip_address/$lc_subnet_mask dev eth1
ip netns exec $NET_NS ip link set dev eth1 up

# Allow localnet routing on the new interfaces if midplane is using a
# subnet in the 127/8 range.
if [[ "${midplane_subnet#127}" != "$midplane_subnet" ]]; then
ip netns exec $NET_NS bash -c "echo 1 > /proc/sys/net/ipv4/conf/eth1/route_localnet"
fi
done
fi

# Read sysctl conf files again
sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf

Expand Down