Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ssd_generic]Fix ssd no model information #302

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hantienEdgecore
Copy link

The ssd information can show correctly even if can't get model information in generic SSD information.
When unknown model name , use Virtuim tool to get SSD Health/Temperature .

How has this been Tested ?
Manual testing on Edgecore switch (ex AS4630_54PE and so on )

Why I did it

Some SSD (like Transcend) can't get model information form generic SSD information .

How I did it

When get no model name , use Virturm cmd to get SSD information .

How to verify it

admin@sonic:/$ sudo show platform ssdhealth --vendor
Device Model : TS32XBTMM1600
Health : 87%
Temperature : 32C
SMART attributes
ID Attribute High Raw Low Raw Value Worst Threshold
1 Raw_Read_Error_Rate 0 0 100 100 0
5 Reserved_Attribute 0 0 100 100 0
9 Power_On_Hours 0 710 100 100 0
12 Power_Cycle_Count 0 9910 100 100 0
160 Uncorrectable_Sector_Count 0 0 100 100 0
161 Valid_Spare_Block 0 55 100 100 0
163 Reserved_Attribute 0 13 100 100 0
164 Reserved_Attribute 0 418014 100 100 0
165 Maximum_Erase_Count 0 437 100 100 0
166 Reserved_Attribute 0 337 100 100 0
167 Average_Erase_Count 0 406 100 100 0
168 NAND_Endurance 0 3000 100 100 0
169 Remaining_Life_Left 0 87 100 100 0
175 Reserved_Attribute 0 0 100 100 0
176 Reserved_Attribute 0 0 100 100 0
177 Reserved_Attribute 0 208 100 100 50
178 Reserved_Attribute 0 0 100 100 0
181 Total_Program_Fail 0 0 100 100 0
182 Total_Erase_Fail 0 0 100 100 0
192 Sudden_Power_Lost_Count 0 106 100 100 0
194 Temperature_Celsius 0 32 100 100 0
195 Hardware_ECC_Recovered 0 1134 100 100 0
196 Reallocated_Event_Count 0 0 100 100 16
197 Current_Pending_Sector_Count 0 0 100 100 0
198 Reserved_Attribute 0 0 100 100 0
199 UDMA_CRC_Error_Count 0 0 100 100 50
232 Reserved_Attribute 0 100 100 100 0
241 Total_LBAs_Written 0 146660 100 100 0
242 Total_LBAs_Read 0 96506 100 100 0
245 Reserved_Attribute 0 418014 100 100 0

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 17, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: hantienEdgecore / name: Hantien (e601d0e)

@hantienEdgecore hantienEdgecore changed the title [ssd_generic]Fix ssd no vendor information [ssd_generic]Fix ssd no model information Aug 17, 2022
@lgtm-com
Copy link

lgtm-com bot commented Aug 19, 2022

This pull request introduces 2 alerts when merging d9bcba3 into 030a382 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison
  • 1 for Unreachable code

@prgeor
Copy link
Collaborator

prgeor commented Nov 18, 2022

@hantienEdgecore can you fix the build failure and the LGTM errors?

The ssd information can show correctly even if can't get model information in generic SSD information.
When unknown model name , use Virtuim tool to get SSD Health/Temperature .

How has this been Tested ?
Manual testing on Edgecore switch (ex AS4630_54PE)
@prgeor
Copy link
Collaborator

prgeor commented Mar 7, 2023

@hantienEdgecore What is the output of smartctl -a /dev/sda

@hantienEdgecore
Copy link
Author

output of "smartctl -a /dev/sda" is like this below.

root@as4630-54npe-1:/home/admin# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-18-2-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: TS32XBTMM1600
Serial Number: F318410050
Firmware Version: O0918B
User Capacity: 32,017,047,552 bytes [32.0 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Mar 8 13:41:30 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 1) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 1281
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 226
160 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
161 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 54
163 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 14
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 494160
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 506
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 406
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 481
168 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 3000
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 84
175 Program_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
176 Erase_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 199
178 Used_Rsvd_Blk_Cnt_Chip 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 161
194 Temperature_Celsius 0x0000 100 100 000 Old_age Offline - 44
195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age Offline - 2367
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 0
197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age Offline - 0
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Total_LBAs_Written 0x0000 100 100 000 Old_age Offline - 350798
242 Total_LBAs_Read 0x0000 100 100 000 Old_age Offline - 218513
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 494160

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
6 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

@@ -57,8 +57,9 @@ def __init__(self, diskdev):
self.fetch_vendor_ssd_info(diskdev, vendor)
self.parse_vendor_ssd_info(vendor)
else:
# No handler registered for this disk model
pass
# unknown model name , use Virtium to get information
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic to assume it will be Virtium is a hack. Is there a better way to identify the SSD vendor? We should enhance the existing logic to identify the vendor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the information from SMARTCTL, INNODISK and VIRTIUM in the same device's SSD.
There is no information about vendor. You can see the result from smartctrl that I posted. We can only know "Device Model: TS32XBTMM1600 Serial Number: F318410050". But it can't be a logical statement because it's just a random model name or SN. and there are many this kind of ssd in my work enviorment. The only way I found to find out the correct ssd health and temperature is to use VIRTIUM cmd to check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants