Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic port configuration - solve lldp issues when adding/removing ports #9386

Merged
merged 3 commits into from
Mar 26, 2022

Conversation

tomer-israel
Copy link
Contributor

Why I did it

when adding and removing ports after init stage we saw two issues:

first:
In several cases, after removing a port, lldpmgr is continuing to try to add a port to lldp with lldpcli command. the execution of this command is continuing to fail since the port is not existing anymore.

second:
after adding a port, we sometimes see this warning messgae:
"Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18"

we added these changes in order to solve it.

How I did it

port create events are taken from app db only.
lldpcli command is executed only when linux port is up.

when delete port event is received we remove this command from pending_cmds dictionary

How to verify it

manual tests and running lldp tests

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

Dynamic port configuration - solve lldp issues when adding/removing ports

A picture of a cute animal (not mandatory but encouraged)

@tomer-israel
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@zhenggen-xu
Copy link
Collaborator

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

@@ -100,31 +109,24 @@ class LldpManager(daemon_base.DaemonBase):
def is_port_up(self, port_name):
"""
Determine if a port is up or down by looking into the oper-status for the port in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change oper-status to netdev-oper-status to be more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be changed

return port_oper_status == "up"
else:
return False
else:
# Retrieve PortInitDone entry from the Port table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log info for debug purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case the port is not up or not exist on table we have log info for it outside the function


if port_alias == None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have such case since you check port_alias above? Even if so, let's use AND check in above to share the log_info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not needed, i will remove it

self.pending_cmds.pop(key, None)
else:
self.log_error("unknown operation")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

if fvp and "up" in dict(fvp).get("oper_status",""):
self.generate_pending_lldp_config_cmd_for_port(key, dict(fvp))
else:
self.pending_cmds.pop(key, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if fvp is empty, should we do no-op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we have SET command fvp should always be with values.

but I will change it in a way that:
self.pending_cmds.pop(key, None)
will be executed only if oper_state is "down"

@dprital dprital added the Request for 202111 Branch For PRs being requested for 202111 branch label Dec 1, 2021
@tomer-israel
Copy link
Contributor Author

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

we still don't know what is the reason for this warning message, even when the host interface is up we see this message.
in the next retry few seconds after the execution of this cmd passed.

@tomer-israel
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@tomer-israel
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@tomer-israel
Copy link
Contributor Author

failed checkers due to unknown reason:

-- Docs: https://docs.pytest.org/en/latest/warnings.html
------ generated xml file: /data/sonic-mgmt/tests/logs/1vlan/posttest.xml ------
=========== 3 passed, 2595 deselected, 27 warnings in 138.42 seconds ===========
INFO:root:Can not get Allure report URL. Please check logs
##[error]Bash exited with code '1'.

@tomer-israel
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@tomer-israel
Copy link
Contributor Author

not sure why we see these failures on kvm tests-t0:

tacacs/test_authorization.py::test_authorization_tacacs_only_some_server_down[vlab-01] FAILED [ 25%]Loading callback plugin unnamed of type old, v1.0 from /usr/local/lib/python2.7/dist-packages/pytest_ansible-2.2.2-py2.7.egg/pytest_ansible/module_dispatcher/v28.pyc
tacacs/test_authorization.py:87: Failed
DEBUG:root:/data/sonic-mgmt/tests/tacacs/utils.py::stop_tacacs_server#27: [ptf-01] AnsibleModule::service, args=[], kwargs={"state": "stopped", "name": "tacacs_plus"}
DEBUG:root:/data/sonic-mgmt/tests/tacacs/utils.py::stop_tacacs_server#27: [ptf-01] AnsibleModule::service Result => {"failed": false, "state": "stopped", "name": "tacacs_plus", "invocation": {"module_args": {"state": "stopped", "sleep": null, "name": "tacacs_plus",

@zhenggen-xu
Copy link
Collaborator

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

we still don't know what is the reason for this warning message, even when the host interface is up we see this message. in the next retry few seconds after the execution of this cmd passed.

This does not make sense. In case host interface is up, lldpctl should not reporting cannot find port error. We should change the check criterion to avoid retrying and without knowing why.

@tomer-israel
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@tomer-israel
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tomer-israel
Copy link
Contributor Author

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

we still don't know what is the reason for this warning message, even when the host interface is up we see this message. in the next retry few seconds after the execution of this cmd passed.

This does not make sense. In case host interface is up, lldpctl should not reporting cannot find port error. We should change the check criterion to avoid retrying and without knowing why.

I checked it again and I noticed that the command is failing (return code is not 0) but the error message (stderr) is an empty string and not "cannot find port Ethernet..."

only after 2 or 3 retries, the return code is 0 - we see this only on several random ports, not always.

so, I guess that checking the oper state of the host interface (from state db) did solve these warning messages ("cannot find port Ethernet...")
I still don't know why the return code is sometimes not 0 without any reason and stderr on these cases is empty

@tomer-israel
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-buildimage (Test kvmtest-t0)

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 9386 in repo Azure/sonic-buildimage

@tomer-israel
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage (Test kvmtest-t0)

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage (Test kvmtest-t0)

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@liat-grozovik
Copy link
Collaborator

@zhenggen-xu could you please recheck last comment?
@dprital please follow up on this PR

@zhenggen-xu
Copy link
Collaborator

zhenggen-xu commented Dec 21, 2021

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

we still don't know what is the reason for this warning message, even when the host interface is up we see this message. in the next retry few seconds after the execution of this cmd passed.

This does not make sense. In case host interface is up, lldpctl should not reporting cannot find port error. We should change the check criterion to avoid retrying and without knowing why.

I checked it again and I noticed that the command is failing (return code is not 0) but the error message (stderr) is an empty string and not "cannot find port Ethernet..."

only after 2 or 3 retries, the return code is 0 - we see this only on several random ports, not always.

so, I guess that checking the oper state of the host interface (from state db) did solve these warning messages ("cannot find port Ethernet...") I still don't know why the return code is sometimes not 0 without any reason and stderr on these cases is empty

what is the return code during the failure? Can we find out a bit more why it was the case? Sorry for insist here, to understand the exact reason and wait on that condition would be more consistent and deterministic. Retry might be simple to hide the issue but might not be reliable depending on the failure.

@tomer-israel
Copy link
Contributor Author

do we know exactly why we see this Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18? If the netdev operation status is up, still happening? It would be better to understand the reason rather than retrying.

we still don't know what is the reason for this warning message, even when the host interface is up we see this message. in the next retry few seconds after the execution of this cmd passed.

This does not make sense. In case host interface is up, lldpctl should not reporting cannot find port error. We should change the check criterion to avoid retrying and without knowing why.

I checked it again and I noticed that the command is failing (return code is not 0) but the error message (stderr) is an empty string and not "cannot find port Ethernet..."
only after 2 or 3 retries, the return code is 0 - we see this only on several random ports, not always.
so, I guess that checking the oper state of the host interface (from state db) did solve these warning messages ("cannot find port Ethernet...") I still don't know why the return code is sometimes not 0 without any reason and stderr on these cases is empty

what is the return code during the failure? Can we find out a bit more why it was the case? Sorry for insist here, to understand the exact reason and wait on that condition would be more consistent and deterministic. Retry might be simple to hide the issue but might not be reliable depending on the failure.

the return code in these cases is 1, I don't think it is saying something about the failure..
I agree that it is a strange failure that can point on a real failure but I didn't find something that can explain it..

@dprital
Copy link
Collaborator

dprital commented Jan 21, 2022

@zhenggen-xu - Can you please approve this PR ?

@dprital
Copy link
Collaborator

dprital commented Feb 6, 2022

@zhenggen-xu - Can you please approve this PR ?

@dprital
Copy link
Collaborator

dprital commented Feb 13, 2022

@zhenggen-xu - can you please review and approve ?

Copy link
Collaborator

@zhenggen-xu zhenggen-xu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open a issue link to here for later to understand why lldpcli would fail even when linux port was up.

@dprital
Copy link
Collaborator

dprital commented Mar 9, 2022

We should open a issue link to here for later to understand why lldpcli would fail even when linux port was up.

Issue was opened: Endless LLDP warning messages when removing a port

@dprital
Copy link
Collaborator

dprital commented Mar 15, 2022

@qiluo-msft - Who can merge this PR ? Thanks.

@qiluo-msft
Copy link
Collaborator

Add unit test?
@abdosi Could you help review?

@qiluo-msft qiluo-msft requested a review from abdosi March 15, 2022 23:46
@dprital
Copy link
Collaborator

dprital commented Mar 22, 2022

@abdosi - can you please review \ approve ?

@dprital
Copy link
Collaborator

dprital commented Mar 24, 2022

@qiluo-msft \ @abdosi - This PR was reviewed and approved by @zhenggen-xu . Can you please review as well as we would like to merge it ? Thanks.

Copy link
Contributor

@abdosi abdosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dprital
Copy link
Collaborator

dprital commented Mar 25, 2022

@qiluo-msft - As this PR was approved , who can merge it ?

@qiluo-msft qiluo-msft merged commit cc938e7 into sonic-net:master Mar 26, 2022
judyjoseph pushed a commit that referenced this pull request Mar 28, 2022
…orts (#9386)

#### Why I did it
when adding and removing ports after init stage we saw two issues:

first:
In several cases, after removing a port, lldpmgr is continuing to try to add a port to lldp with lldpcli command. the execution of this command is continuing to fail since the port is not existing anymore.

second:
after adding a port, we sometimes see this warning messgae:
"Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18"

we added these changes in order to solve it.
#### How I did it
port create events are taken from app db only.
lldpcli command is executed only when linux port is up.

when delete port event is received we remove this command from  pending_cmds dictionary

#### How to verify it
manual tests and running lldp tests


#### Description for the changelog
Dynamic port configuration - solve lldp issues when adding/removing ports
@rraghav-cisco
Copy link

@abdosi and @zhenggen-xu ; can this be ported to 202205, pls ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Included in 202111 Branch Request for 202111 Branch For PRs being requested for 202111 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants