-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlnx-sfp-plugin] enhancement to support transceiver sensor monitoring #1839
Conversation
* Modify the eeprom folder to make it accessably from pmon container * implement the get_transceiver_change_event API file change list modified: device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py signed-off-by Liu Kebo kebol@mellanox.com
@@ -64,6 +92,22 @@ def get_presence(self, port_num): | |||
return True | |||
|
|||
return False | |||
|
|||
def get_presence_status_file_path(self, port_num): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code added in this PR is duplicated among all the plugins. Is there a reason this functionality can't live in sfputilbase.py? Will it not be shared by other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the file path is specific for each platform, it's even various among platforms from one vendor, so this has to be specific for each platform.
|
||
INSIDE_DOCKER = determine_in_docker() | ||
|
||
if INSIDE_DOCKER: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only connect to the Redis DB if running inside a Docker container? If this is to determine whether or not the plugin is being used by xcvrd, it's not an intuitive check. I think it would be better to create a function to connect to the State DB and call that function in get_transceiver_change_event()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfputil plugin is also called by 'show interface transceiver' CLI on the host, but the swsscommon is not available on the host, to make it work both on the host and the docker, need to conditionally import swsscommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised.
|
||
try: | ||
path = "/sys/class/i2c-adapter/i2c-2/2-0048/hwmon/hwmon7/qsfp%d_status" % (port_num+1) | ||
#path = "/bsp/qsfp/qsfp%d_status" % (port_num+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#path = [](start = 12, length = 7)
remove commented code. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
@@ -149,3 +193,21 @@ def reset(self, port_num): | |||
return False | |||
|
|||
return False | |||
|
|||
def get_transceiver_change_event(self, timeout=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_transceiver_change_event [](start = 8, length = 28)
This is vendor independent code. Let's move it out of plugin.
Who is using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft this function is intended to let each vendor have their own implementation, because each vendor may have different ways to get the SFP plug in/out notification. Here is the implementation on mlnx platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue https://github.com/Azure/SONiC/blob/gh-pages/doc/transceiver-monitor-hld.md is not a good design, at least for Mellanox platform.
Mellanox syncd docker could directly get SFP notification and write to STATE_DB. There is no benefit to write to redis once, blocking waiting on it in another container (pmon) and write redis again. The disadvantages are new dependencies (like in this file), fragile system, and hard to debug.
If this is true for Mellanox, how could it benefit other vendors?
In reply to: 200867307 [](ancestors = 200867307)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with you if we only consider mlnx platform, and at the very beginning, my design is to post the transceiver info to DB from syncd since mlnx can get the sfp change event from there. But after discussing with Guohan and community, we should have a common solution for all the vendors, design as common sfp change event API and left it to be implemented by each vendor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understood. This is a perfect to time to revisit the design.
In reply to: 201230542 [](ancestors = 201230542)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking in -- are we or are we not revisiting the design? I believe the current design is the common solution for all vendors that @keboliu mentioned above. Therefore, I don't believe there is a need to revisit the design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jleveque Yes, this the common solution for all vendors that have discussed.
@qiluo-msft would like to get your view on this.
more background information about this, for other vendors, may possible to get this event from sysfs, which will be very easy for them to implement this common API since sysfs is accessible from pmon. you may check with Hui who have some investigation on other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comments.
|
||
# Subscribe to state table for SFP change notifications | ||
sel = swsscommon.Select() | ||
sel_tbl = swsscommon.NotificationConsumer(state_db, 'TRANSCEIVER_NOTIFY') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRANSCEIVER_NOTIFY [](start = 57, length = 18)
Just curious which component provide the notification? As https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD, why let the provider write STATE DB directly? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft on top of the previous comments, on mlnx platform, this notification is exposed from another daemon inside syncd(##1841), and that daemon post the notification to the DB. For other vendors, they may have their implementation, for example, get this notification directly from sysfs.
@@ -64,6 +92,21 @@ def get_presence(self, port_num): | |||
return True | |||
|
|||
return False | |||
|
|||
def get_presence_status_file_path(self, port_num): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_presence_status_file_path [](start = 8, length = 29)
Who is calling this function? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, not called anymore, call in the original solution. removed.
INSIDE_DOCKER = determine_in_docker() | ||
|
||
if INSIDE_DOCKER: | ||
from swsscommon import swsscommon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swsscommon [](start = 9, length = 10)
Changing behaviors based on library availability or docker environment is not a good idea. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised. the swsscommon only imported in the new add function 'get_transceiver_change_event', since it will only be called from pmon container.
the check on mellanox build failure seems not related to this PR, can we trigger the build check again w/o push a new commit?
|
check fail due to Jenkins issue?
|
Yes, the Jenkins server stuck on 7/17, and we restart the service. You could always add a new Github comment "Retest this please" to trigger the build check without committing code. #Closed |
let's merge and test and kebo will do new iterations
@jleveque , any pending issue on your side? if not, can you approve and let's test. |
#1839) * [mlnx-sfpplugin] enhancement to support tranceiver sensor monitoring * Modify the eeprom folder to make it accessably from pmon container * implement the get_transceiver_change_event API file change list modified: device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py modified: device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py signed-off-by Liu Kebo kebol@mellanox.com * remove commented code * revise the get_transceiver_change_event implementation and remove unused function * remove blank
0b5f90b (HEAD -> 202012, origin/202012) [show techsupport] fix bash errors in generate_dump script (sonic-net#1844) 388c50c [202012][warmboot] Add new preboot health check: verify db integrity (sonic-net#1839) d73dc98 [config] support for configuring muxcable to standby mode of operation (sonic-net#1837) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
0b5f90b (HEAD -> 202012, origin/202012) [show techsupport] fix bash errors in generate_dump script (#1844) 388c50c [202012][warmboot] Add new preboot health check: verify db integrity (#1839) d73dc98 [config] support for configuring muxcable to standby mode of operation (#1837) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
- What I did
enhancement to support tranceiver sensor monitoring
- How I did it
* Modify the eeprom folder to make it accessably from pmon container
* implement the get_transceiver_change_event API
- How to verify it
run test on mlnx platform.
- Description for the changelog
file change list
modified: device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py
- A picture of a cute animal (not mandatory but encouraged)