Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add asic sensors info parsing in minigraph.py for enabling ASIC thermal mo… #19068

Closed
wants to merge 1 commit into from

Conversation

tudupa
Copy link

@tudupa tudupa commented May 24, 2024

…nitoring

Why I did it

ASIC thermal monitoring needs to be enabled for Nokia platforms. This PR helps enable ASIC thermal monitoring via minigraph.xml. minigraph.py parses Asic sensors information present in the minigraph.xml to enable support for ASIC sensors.
For more information on ASIC thermal monitoring, please refer to https://github.com/sonic-net/SONiC/blob/master/doc/asic_thermal_monitoring_hld.md

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added support in minigraph.py to parse asic sensors information present in the DeviceInfo section of the minigraph.xml. It parses the AsicSensors section of the minigraph.xml and sonic-cfggen then writes the parsed entry into the config_db.

How to verify it

A unit test test_minigraph_asic_sensors is written in test_minigraph_case.py to parse a sample minigraph.xml containing the asic_sensors information and is verified that the test passes. This was verified by running make target/python-wheels/bookworm/sonic_config_engine-1.0-py3-none-any.whl.
Other tests include

  1. Testing by running "config load minigraph -y" and verifying that ASIC sensors info is written to config_db
  2. Output of show platform temperature should show a value for ASIC sensor temperature.
show platform temperature 
   Sensor    Temperature    High TH    Low TH    Crit High TH    Crit Low TH    Warning          Timestamp
---------  -------------  ---------  --------  --------------  -------------  ---------  -----------------
     ASIC         55            100       N/A           110.0            N/A      False  20240521 18:04:00
CPU CORE         59.055        100       N/A           106.0            N/A      False  20240521 18:04:00
PCB BACK         37.875         80       N/A             N/A            N/A      False  20240521 18:04:00
PCB FRONT         38.5           80       N/A             N/A            N/A      False  20240521 18:04:00
  PCB MID         43.25          80       N/A             N/A            N/A      False  20240521 18:04:00
  1. Config_db.json shows the correct entry for ASIC sensors
sudo cat /etc/sonic/config_db.json | grep "ASIC_SENSORS" -A 20
    "ASIC_SENSORS": {
        "ASIC_SENSORS_POLLER_INTERVAL": {
            "interval": "10"
        },
        "ASIC_SENSORS_POLLER_STATUS": {
            "admin_status": "enable"
        }
    },
  1. Negative testing done to verify if config load minigraph works correctly if the ASIC sensors info is not present in the minigraph.xml

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…nitoring

Signed-off-by: tudupa <tanmaya.udupa@nokia.com>
@judyjoseph
Copy link
Contributor

/azpw ms_conflict

@judyjoseph judyjoseph added the Chassis 🤖 Modular chassis support label May 28, 2024
@judyjoseph judyjoseph self-assigned this May 28, 2024
@judyjoseph judyjoseph requested a review from arlakshm May 29, 2024 05:42
@@ -1234,8 +1235,13 @@ def parse_deviceinfo(meta, hwsku):
if hostname is not None:
key = "%s|%s" % (hostname.text, key)
sys_ports[key] = {"system_port_id": system_port_id, "switch_id": switch_id, "core_index": core_id, "core_port_index": core_port_id, "speed": speed, "num_voq": num_voq}

return port_speeds, port_descriptions, sys_ports
asic_sensor = device_info.find(str(QName(ns,"AsicSensors")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have "AsicSensors" and the attributes in the production minigraph for SUP and Linecard.

Copy link
Contributor

@judyjoseph judyjoseph Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a control parameter eg: poller, similar to "controllable" here in platform.json file to control this,

If the new control flag we add is true, for any of the ASIC* thermals, we can include the ASIC_SENSOR config in config_db.

    "ASIC_SENSORS": {
        "ASIC_SENSORS_POLLER_INTERVAL": {
            "interval": "10"
        },
        "ASIC_SENSORS_POLLER_STATUS": {
            "admin_status": "enable"
        }
    },

The Poller interval could be controllable too if we can add it to thermals in platform.json. If this is agreeable, we can update/reach out to authors of https://github.com/sonic-net/SONiC/blob/master/doc/asic_thermal_monitoring_hld.md (REF : sonic-net/SONiC#557)

@prgeor to comment, Any other platforms use ASIC_SENSOR config now?

@judyjoseph
Copy link
Contributor

@kenneth-arista @anamehra to review too to check if these ASIC_SENSORS config is needed in other platforms -- to get the ASIC thermals.

Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph I don't think we should add platform specific sensor details in minigraph. The HLD quoted is quite old.

@judyjoseph
Copy link
Contributor

judyjoseph commented Jun 14, 2024

@judyjoseph I don't think we should add platform specific sensor details in minigraph. The HLD quoted is quite old.

Yes we should not take it from minigraph

We could have a flag eg: sai_asic_poll : true/false based on which we can push this above attributes to config_db ? Have to make sure it is backward compatible too.

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Jun 21, 2024

@judyjoseph @prgeor The following is the proposal. This proposal is similar to the Fabirc port monitoring for chassis. Please check if this is the right approach. Thanks.

  1. Create a new file call asic_sensors_polling_config.json file in the vendor's platform directory as below:
    {
    "ASIC_SENSORS": {
    "ASIC_SENSORS_POLLER_INTERVAL": {
    "interval": "10"
    },
    "ASIC_SENSORS_POLLER_STATUS": {
    "admin_status": "enable"
    }
    }
    }
  1. Add a new module asic_sensor_config.,py in src/sonci-config-engine directory with a function get_asic_sensors_polling_config(). This function checks if file asic_senors_polling_config.json exists, return it's config json data
  2. Modify the minigrpah.py or sonic-cfggen to call Update README.md #2) get_asic_sensors_polling_config(). if return a data, results['ASIC_SENSORS'] = returned data.

@spilkey-cisco
Copy link
Contributor

How will asic_sensors_polling_config.json be applied on a multi-asic system? Will the same configuration be applied to all asic namespaces? Will there be separate sections within the json file for each asic namespace? Well each section in the json have a field to say which asic namespaces to be applied to? Will there be a separate json file for each asic namespace?

@mlok-nokia
Copy link
Contributor

How will asic_sensors_polling_config.json be applied on a multi-asic system? Will the same configuration be applied to all asic namespaces? Will there be separate sections within the json file for each asic namespace? Well each section in the json have a field to say which asic namespaces to be applied to? Will there be a separate json file for each asic namespace?

asic_sensor config will be appled to all asic and includes the localhost namespace.

@prgeor
Copy link
Contributor

prgeor commented Jul 17, 2024

@mlok-nokia
Copy link
Contributor

@tudupa @judyjoseph can we use this approach? https://github.com/bmridul/SONiC/blob/3ed3cca2866e3647b8af9871b33dc53b5dbae1a6/doc/pmon/pmon-sensormon.md

I don't think this can be done in the 202205 branch on Nokia platform. Nokia platform has its NDK to manager/monitor the temperature and let Supervisor card drive its FAN speed.

@judyjoseph
Copy link
Contributor

judyjoseph commented Jul 18, 2024

@tudupa @judyjoseph can we use this approach? https://github.com/bmridul/SONiC/blob/3ed3cca2866e3647b8af9871b33dc53b5dbae1a6/doc/pmon/pmon-sensormon.md

I don't think this can be done in the 202205 branch on Nokia platform. Nokia platform has its NDK to manager/monitor the temperature and let Supervisor card drive its FAN speed.

To add Prince, we have this functionality to poll the ASIC sensors from swss for a while now. sonic-net/sonic-swss#1517.

We could use this existing method instead of a new sensord daemon, I see the new daemon is mainly for other non-asic platform sensors.

@abdosi
Copy link
Contributor

abdosi commented Aug 7, 2024

@bmridul : Can you provide your inputs/comment on this ?

@judyjoseph
Copy link
Contributor

@judyjoseph @prgeor The following is the proposal. This proposal is similar to the Fabirc port monitoring for chassis. Please check if this is the right approach. Thanks.

  1. Create a new file call asic_sensors_polling_config.json file in the vendor's platform directory as below:
    {
    "ASIC_SENSORS": {
    "ASIC_SENSORS_POLLER_INTERVAL": {
    "interval": "10"
    },
    "ASIC_SENSORS_POLLER_STATUS": {
    "admin_status": "enable"
    }
    }
    }

  2. Add a new module asic_sensor_config.,py in src/sonci-config-engine directory with a function get_asic_sensors_polling_config(). This function checks if file asic_senors_polling_config.json exists, return it's config json data

  3. Modify the minigrpah.py or sonic-cfggen to call Update README.md #2) get_asic_sensors_polling_config(). if return a data, results['ASIC_SENSORS'] = returned data.

Reviving this discussion to get this PR merged : @mlok-nokia

Can we add this data to /usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0/platform.json itself as a new json file

    "interfaces": {},
    "asic_sensors": {
        "poll_interval": "10",
        "poll_admin_status": "enable"
    }

We could add the parsing of these new attributes from platform file in minigraph.py itself. We do similar parsing in portconfig.py today.

@judyjoseph
Copy link
Contributor

@bmridul : Can you provide your inputs/comment on this ?

@bmridul were you able to check if Cisco platforms already use the SAI ASIC polling ?

@bmridul
Copy link
Contributor

bmridul commented Oct 3, 2024

@bmridul : Can you provide your inputs/comment on this ?

@bmridul were you able to check if Cisco platforms already use the SAI ASIC polling ?

Hi Judy,
yes. Cisco platforms already use it. For Cisco platforms, we are adding this config during platform initialization on the router as the config is needed to enable NPU temperature sensors which are using to implement fan control logic for the router.

In regards to the PR, I prefer that the data come from platform.json as you suggested.

@abdosi
Copy link
Contributor

abdosi commented Oct 23, 2024

@judyjoseph has new design to use platform.json to achieve this. closing this approach.
each vendor need to create seprate PR for their platform.

@mlok-nokia @anamehra @kenneth-arista for viz.

@abdosi abdosi closed this Oct 23, 2024
@judyjoseph
Copy link
Contributor

@mlok-nokia I have the approach updated in this HLD : sonic-net/SONiC#1841. Since this attribute is not actually derived from minigraph.xml, I think we should parse platform.json and push to DB in config/main.py:load_minigraph() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants