Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support bcmsh and swss logs on multi npu platforms #4783

Closed
wants to merge 30 commits into from

Conversation

arlakshm
Copy link
Contributor

@arlakshm arlakshm commented Jun 16, 2020

Changes to support bcmsh and swss logs on multi npu platforms
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
This PR has following changes

  • support accessing the bcmsh and bcmcmd utilites on multi NPU platforms
  • store swss.rec and sairedis.rec logs for a every NPU to a different directory

- How I did it
Changes done

  • move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
  • move the link for /var/log/swss from docker_orchagent.mk to docker_image_ctl.j2 to support multi NPUs
  • update the bcmsh and bcmcmd scripts to take the syncd docker name of as argument on multi NPU platforms

- How to verify it
- Verify the bcmsh and bcmcmd works

root@sonic:/home/admin# bcmcmd syncd0 "l3 intf show"
l3 intf show
Free L3INTF entries: 8186
Unit  Intf  VRF Group VLAN    Source Mac     MTU TTL Tunnel InnerVlan  NATRealm
------------------------------------------------------------------
0     1     0     0     1    00:be:75:3a:ef:50  9100 0    0     0     0
0     2     0     0     4095 00:be:75:3a:ef:50  9100 0    0     0     0
0     3     0     0     4095 00:be:75:3a:ef:50  9100 0    0     0     0
0     4     0     0     4095 00:be:75:3a:ef:50  9100 0    0     0     0
0     5     0     0     4095 00:be:75:3a:ef:50  9100 0    0     0     0

Verify the swss.rec and sairedis.rec are present of each swss instance on the host

root@sonic:/home/admin# ls -l /var/log/swss2
total 5604
-rw-r--r-- 1 root root 3349874 Jun 16 23:48 sairedis.rec
-rw-r--r-- 1 root root 2387772 Jun 17 00:31 swss.rec
root@sonic:/home/admin# ls -l /var/log/swss1
total 6044
-rw-r--r-- 1 root root 3978515 Jun 16 23:48 sairedis.rec
-rw-r--r-- 1 root root 2203857 Jun 17 00:31 swss.rec

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
prsunny
prsunny previously approved these changes Jun 23, 2020
Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, request @lguohan to provide feedback.

abdosi
abdosi previously approved these changes Jun 23, 2020
{%- endif %}
{%- endif %}
{%- if docker_container_name == "swss" %}
-v /var/log/swss$DEV:/var/log/swss:rw \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know if logrotate works if changed the place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added changes for logrotation in the latest commit

@@ -1,3 +1,29 @@
#!/bin/bash
function print_error() {
echo "No such container $NS"
echo "usage bcmcmd <container name> <command>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by ?

should we just say bcmcmd -n [asicid] "cmd"?

add an option seems better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to bcmcmd -n [asicid] "cmd" in the latest commit

@@ -1,3 +1,29 @@
#!/bin/bash
function print_error() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we duplicating the file here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

xumia and others added 15 commits June 29, 2020 15:01
* Add the test signing certificates for secure boot
* Remove unnecessary ca key file
* Regenerate the certificates to not expose the ca key
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
* src/sonic-platform-common 82bbeab...42781ff (1):
  > [SfpBase] Fix key name typo in docstring (sonic-net#99)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
1.  Upgrade SAI headers to v1.6.3
2.  Fix traffic lost during FFB related to buffer config + optimize buffer config timing for FB
3.  Add ACL fields BTH, IP flags
4.  Add ACL infrastructure of different fields per ASIC type
…platform. (sonic-net#4779)

* Support for connecting to DB in namespace via IP:port ( using docker bridge network ) for applications in multi-asic platform.

* Added the default IP as 127.0.0.1 if the IPaddress derivation from interface fails.
Moved the localhost loopback IP binding logic into the supervisor.j2 file.
* sonic-sairedis submodule update
* Update BRCM SAI to 3.7.5.1
…et#4843)

Updated the NAT iptables patch for 4.19 buster

Depends on PR : sonic-net/sonic-linux-kernel#147

1 Known issue:

With both NAT patch files for 4.19 buster kernel, seeing 1 display issue in iptables like explained below

On Docker NAT, iptables supported version is 1.6.0 and on base OS it’s 1.8.2. So seeing an display issue of which fullcone option is not showing in version 1.8.2 iptables output and no issues in functionality.

Display issue – For example of comparsion:

NAT Docker:
root@sonic:/home/admin# docker exec -it nat bash
root@sonic:/# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DNAT all -- * * 0.0.0.0/0 0.0.0.0/0 to:1.1.1.1 fullcone

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 30 packets, 2749 bytes)
pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 30 packets, 2749 bytes)
pkts bytes target prot opt in out source destination
root@sonic:/#

Base OS:
root@sonic:/home/admin# iptables-legacy -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1 36 DNAT all -- * * 0.0.0.0/0 0.0.0.0/0 to:1.1.1.1

Chain INPUT (policy ACCEPT 1 packets, 36 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 41 packets, 3572 bytes)
pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 41 packets, 3572 bytes)
pkts bytes target prot opt in out source destination
root@sonic:/home/admin#

To fix this issue, iptables need to update from 1.6.0 to 1.8.2 version and have to update the NAT docker from stretch to buster. Will raise a new PR with this.

Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com

Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
… both host and docker namespace (sonic-net#4838)

* [sonic-buildimage] Changes to make network specific sysctl
common for both host and docker namespace (in multi-npu).

This change is triggered with issue found in multi-npu platforms
where in docker namespace
net.ipv6.conf.all.forwarding was 0 (should be 1) because of
which RS/RA message were triggered and link-local router were learnt.

Beside this there were some other sysctl.net.ipv6* params whose value
in docker namespace is not same as host namespace.

So to make we are always in sync in host and docker namespace
created common file that list all sysctl.net.* params and used
both by host and docker namespace. Any change will get applied
to both namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments and made sure to invoke augtool
only one and do string concatenation of all set commands

* Address Review Comments.
…is no one (sonic-net#4860)

If some table with a list of tuples (interface name, ip prefix) has ip prefixes without a mask length, it will cause issues in SONiC. For example quagga and frr will treat ipv4 address without a mask, so "10.20.30.40" address will be treated as "10.0.0.0/8", which is dangerous.

The fix here is that when pfx_filter get a tuple (interface name, ip prefix), where the ip prefix doesn't have prefix mask length, add a mask by default: "/32 for ipv4 addresses, /128 for ipv6 addresses".

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
* Tests of FRR templates which rendered by sonic-cfggen
…t#4825)

* Loopback IP changes for multi ASIC devices
multi ASIC will have 2 Loopback Interfaces

- Loopback0 has globally unique IP address, which is advertised by the multi ASIC device to its peers.
This way all the external devices will see this device as a single device.
- Loopback4096 is assigned an IP address which has a scope is within the device. Each ASIC has a different ip address for Loopback4096. This ip address will be used as Router-Id by the bgp instance on multi ASIC devices.

This PR implements this change for multi ASIC devices

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
…c-net#4880)

make swss build depends only on libsairedis instead of syncd. This allows to build swss without depending
on vendor sai library.

Currently, libsairedis build also buils syncd which requires vendor SAI lib. This makes difficult to build
swss docker in buster while still keeping syncd docker in stretch, as swss requires libsairedis which also
build syncd and requires vendor to provide SAI for buster. As swss docker does not really contain syncd
binary, so it is not necessary to build syncd for swss docker.

* [submodule]: update sonic-sairedis

* ccbb3bc 2020-06-28 | add option to build without syncd (HEAD, origin/master, origin/HEAD) [Guohan Lu]
* 4247481 2020-06-28 | install saidiscovery into syncd package [Guohan Lu]
* 61b8e8e 2020-06-26 | Revert "sonic-sairedis: Add support to sonic-sairedis for gearbox phys (sonic-net#624)" (sonic-net#630) [Danny Allen]
* 85e543c 2020-06-26 | add a README to tests directory to describe how to run 'make check' (sonic-net#629) [Syd Logan]
* 2772f15 2020-06-26 | sonic-sairedis: Add support to sonic-sairedis for gearbox phys (sonic-net#624) [Syd Logan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
paavaanan and others added 8 commits July 2, 2020 22:08
- Xilinx/pericom peripherals are not actively used in DellEMC S6100 switch.
- These peripherals are throwing PCIE corrected messages in some of the units and filling syslog.
- Since it is not usable disabling it at startup.
* src/sonic-platform-daemons abe115e...9b8bfa1 (1):
  > [xcvrd] Update key names in 'get_media_settings_value()' (sonic-net#63)
also update submodule

* 01f810f 2020-07-02 | fix compiling issue for gcc8.3 (sonic-net#1339) [lguohan]
* 9b13120 2020-07-03 | Fix in script to avoid orchagent crash when port down followed by fdb delete (sonic-net#1340) [rupesh-k]
* 9b01844 2020-07-01 | [qosorch] Update QoS scheduler params for shaping features (sonic-net#1296) [Michael Li]
* 86b5e99 2020-07-02 | [mirrororch] Port Mirroring implementation (sonic-net#1314) [rupesh-k]
* c05601c 2020-06-24 | [portsyncd]: add debug message if a port cannot be found in port able (sonic-net#1328) [lguohan]
* a0b6412 2020-06-23 | COPP_DEL_fix: DEL for one trap group from SONIC is resetting all the trap IDs (sonic-net#1273) [SinghMinu]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
…onic-net#4890)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
thermalctld throwing error messages because it is not yet fully configured, disabling it for now on arista platforms.

Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@arlakshm arlakshm dismissed stale reviews from abdosi and prsunny via 969a914 July 6, 2020 05:17
@arlakshm arlakshm requested a review from jleveque July 6, 2020 05:20
@arlakshm
Copy link
Contributor Author

arlakshm commented Jul 6, 2020

retest vsimage please

Comment on lines 21 to 40
if [[ ($NUM_ASIC -gt 1) ]]; then
OPTIND=1

while getopts ":n:h:" opt; do
case "${opt}" in
h) help
exit 0
;;
n) DEV=${OPTARG}
echo $DEV
[ $DEV -lt $NUM_ASIC -a $DEV -ge 0 ] || help
;;
esac
done
shift "$((OPTIND-1))"

if [ -z "${DEV}" ]; then
help
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation. Please unify to four (4) spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentation in the latest commit

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments. Also, it appears you are replacing files/image_config/logrotate/logrotate.d/rsyslog with logrotate-rsyslog.j2, in which case you should also delete the former as part of this PR.

@@ -20,4 +20,7 @@ fi

sonic-cfggen -d -t /usr/share/sonic/templates/rsyslog.conf.j2 -a "{\"udp_server_ip\": \"$udp_server_ip\"}" >/etc/rsyslog.conf

#render the logrotate config files for rsyslog as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/#render/# Render/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@arlakshm
Copy link
Contributor Author

arlakshm commented Jul 6, 2020

As comments. Also, it appears you are replacing files/image_config/logrotate/logrotate.d/rsyslog with logrotate-rsyslog.j2, in which case you should also delete the former as part of this PR.

Removed the file files/image_config/logrotate/logrotate.d/rsyslog

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@jleveque jleveque self-requested a review July 6, 2020 22:21
@arlakshm
Copy link
Contributor Author

arlakshm commented Jul 7, 2020

retest mellanox please

{% if namespaces > 1 %}
{% for ns in range(namespaces) %}
/var/log/swss{{ns}}/sairedis.rec
/var/log/swss{{ns}}/swss.rec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about other logs such as quagga, teamd? they are also running in the each namespace? are we aggregating all of them into one log file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not fan using templating everywhere. I think a better solution here is to add option to both swss and sairedis to allow specify the log file name, and we can use swss.asic{n}.rec, sairedis.asic{n}.rec. then, we do not need to use template for rsyslog file.

In the future, if we need to dynamically incrase the asic number of the platform, there is no need to change this file.

{% if namespaces > 1 %}
{% for ns in range(namespaces) %}
if [ $(echo $1 | grep -c "/var/log/swss{{ns}}/") -gt 0 ]; then
pgrep -x orchagent | xargs /bin/kill -HUP 2>/dev/null || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this kill all orchagent instances?

Copy link
Collaborator

@lguohan lguohan Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably do not need to do template here since, logrotate will pass the absolute path to the log file to this postrotate script, and from that absoluate path, you will know whethere it is multiasic or not, and which asic id it is if it is multi-asic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should get into the container and do the pkill

@lguohan
Copy link
Collaborator

lguohan commented Jul 8, 2020

it looks like the syslog will take more time, if you can separate the bcmsh with syslog enhancement, I think I can approve for bcmsh pr.

@lguohan
Copy link
Collaborator

lguohan commented Jul 22, 2020

do we still need this pr? I thought this pr is splited into two?

@arlakshm
Copy link
Contributor Author

Closing this PR as the bcmshell support is added in the PR #4926

@arlakshm arlakshm closed this Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.