Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PDDF] PSU fan status is always NOT OK while thermalctld is enabled #8129

Open
seanwu-ec opened this issue Jul 8, 2021 · 4 comments
Open
Labels
Triaged this issue has been triaged

Comments

@seanwu-ec
Copy link
Contributor

seanwu-ec commented Jul 8, 2021

Description

PSU Fan status is always NOT OK while pmon's thermalctld is enabled. As below:

admin@sonic:~$ show platform fan
  Drawer    LED         FAN               Speed    Direction    Presence    Status          Timestamp
--------  -----  ----------  ------------------  -----------  ----------  --------  -----------------
     N/A  green   PSU1_FAN1  77.15555555555555%      exhaust     Present    Not OK  20210617 18:45:49
     N/A  green   PSU2_FAN1  76.08888888888889%      exhaust     Present    Not OK  20210617 18:45:49

Suggestion for change

While it is PSU fan, PddfFan.get_target_speed() should raise NotImplementedError instead of returning 0.
https://github.com/Azure/sonic-buildimage/blob/4f2bc1fbeddc49af62c8f1acb748e251d043e792/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py#L227
Otherwise, PSU fan will fail the over_speed check all the time while the real speed is much greater than 0%
https://github.com/Azure/sonic-platform-daemons/blob/2d2749ab77ea0cfb9b1a9a0a5c7eeffbde9daed8/sonic-thermalctld/scripts/thermalctld#L349

Steps to reproduce the issue:

  1. All PSUs are well plugged and powered.
  2. Make sure thermalctld in pmon container is running. (Or invoke it manually: python3 /usr/local/bin/thermalctld)
  3. Type cmd show platform fan. You will see PSU Fan status is Not OK.

Describe the results you received:

PSU fan status should be 'OK'

Describe the results you expected:

PSU fan status is 'Not OK'

Output of show version:

SONiC Software Version: SONiC.master-8115.22792-d40be3086
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: d40be3086
Build date: Wed Jul  7 07:58:49 UTC 2021
Built by: AzDevOps@sonic-build-workers-000GRR

Platform: x86_64-accton_as9716_32d-r0
HwSKU: Accton-AS9716-32D
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 18:13:11 up  1:43,  4 users,  load average: 2.18, 1.76, 1.79

Docker images:
REPOSITORY                    TAG                           IMAGE ID            SIZE
docker-platform-monitor       latest                        b8d4aae7ead7        627MB
docker-platform-monitor       master-8115.22792-d40be3086   b8d4aae7ead7        627MB
docker-macsec                 latest                        815692b903ff        427MB
docker-macsec                 master-8115.22792-d40be3086   815692b903ff        427MB
docker-teamd                  latest                        13c4073538c7        424MB
docker-teamd                  master-8115.22792-d40be3086   13c4073538c7        424MB
docker-snmp                   latest                        bd3e67e44b70        454MB
docker-snmp                   master-8115.22792-d40be3086   bd3e67e44b70        454MB
docker-database               latest                        9d22b800e462        413MB
docker-database               master-8115.22792-d40be3086   9d22b800e462        413MB
docker-lldp                   latest                        b89a34f2a4e9        453MB
docker-lldp                   master-8115.22792-d40be3086   b89a34f2a4e9        453MB
docker-orchagent              latest                        16bc98c8190f        442MB
docker-orchagent              master-8115.22792-d40be3086   16bc98c8190f        442MB
docker-nat                    latest                        9fc5997ea17c        427MB
docker-nat                    master-8115.22792-d40be3086   9fc5997ea17c        427MB
docker-sonic-mgmt-framework   latest                        704e6ec89696        570MB
docker-sonic-mgmt-framework   master-8115.22792-d40be3086   704e6ec89696        570MB
docker-sonic-telemetry        latest                        a2946d1dcd84        501MB
docker-sonic-telemetry        master-8115.22792-d40be3086   a2946d1dcd84        501MB
docker-dhcp-relay             latest                        7a3c6b47ce19        420MB
docker-dhcp-relay             master-8115.22792-d40be3086   7a3c6b47ce19        420MB
docker-fpm-frr                latest                        49377a9cfebf        442MB
docker-fpm-frr                master-8115.22792-d40be3086   49377a9cfebf        442MB
docker-sflow                  latest                        f14bcfdaa9a0        425MB
docker-sflow                  master-8115.22792-d40be3086   f14bcfdaa9a0        425MB
docker-router-advertiser      latest                        3322539bfe10        413MB
docker-router-advertiser      master-8115.22792-d40be3086   3322539bfe10        413MB
docker-syncd-brcm             latest                        6ad9b367a389        705MB
docker-syncd-brcm             master-8115.22792-d40be3086   6ad9b367a389        705MB
@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Jul 21, 2021
@zhangyanzhao
Copy link
Collaborator

@adyeung will take a look

@FuzailBrcm
Copy link
Contributor

@seanwu-ec
Thanks for raising this. Your suggestion seems correct but I need to test some more as we didn't enable thermalctld locally (or enabled it with some restrictions). I will work on it and push the fix.

@seanwu-ec
Copy link
Contributor Author

Understood. I appreciate that, @FuzailBrcm.
If you know any downsides or reasons that we should not enable thermalctld, please kindly let us know. Recently we are enabling it back because some customers complained show platform fan/temperature doesn't work.

@FuzailBrcm
Copy link
Contributor

Added the fix for this issue as part of
#7834

FuzailBrcm added a commit to FuzailBrcm/sonic-buildimage that referenced this issue Dec 17, 2021
lguohan pushed a commit that referenced this issue Jan 3, 2022
Why I did it
Some platforms need to run few steps before the PDDF service is actually started.

* Adding pre_pddf_init script in the service file
* Raising exception for get_target_speed() for PSU-fan in PDDF (#8129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants