Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Add hw-mgmt patch to support SDK OFFLINE event handling during ISSU #6550

Merged
merged 1 commit into from
Jan 27, 2021

Conversation

keboliu
Copy link
Collaborator

@keboliu keboliu commented Jan 25, 2021

- Why I did it
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

- How I did it
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

- How to verify it
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

…in service firmware upgrade

In order to prevent "mlxsw_minimal" driver accessing ASIC during in
service firmware upgrade flow, SDK will raise "OFFLINE" 'udev' event
at early beginning of such flow. When this event is received,
hw-managemnet will remove "mlxsw_minimal" driver.
There is no need to implement opposite "ONLINE" event, since this flow
is ended up with "kexec".

Signed-off-by: Kebo Liu <kebol@nvidia.com>
@keboliu
Copy link
Collaborator Author

keboliu commented Jan 26, 2021

retest vsimage please

@keboliu
Copy link
Collaborator Author

keboliu commented Jan 26, 2021

retest vsimage

@lguohan
Copy link
Collaborator

lguohan commented Jan 27, 2021

/Azurepipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@keboliu
Copy link
Collaborator Author

keboliu commented Jan 27, 2021

retest vsimage please

@liat-grozovik
Copy link
Collaborator

this is the same fix in 201911, which was approved and merge. so going a head and merging it as well.

@liat-grozovik liat-grozovik merged commit 9ff5644 into sonic-net:master Jan 27, 2021
@keboliu keboliu deleted the master-sdk-hw-mgmt-sync branch January 28, 2021 01:12
lguohan pushed a commit that referenced this pull request Jan 28, 2021
…in service firmware upgrade (#6550)

During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants