Input File Format
The smartctl CLI collects the current SMART data of drives and generates a CSV file. For details, see Table 1.
After filtering common SMART data used in the industry, the following SMART data are extracted as the input data for random forest model training and prediction.
|
SMART ID |
CSV File Column Name |
Attribute Name |
Remarks |
|---|---|---|---|
|
1 |
ssdSmart.smart_1(_raw)_value |
Raw Read Error Rate |
For SSDs, the raw value of this item is calculated with correctable errors and uncorrectable RAISE errors included. |
|
5 |
ssdSmart.smart_5(_raw)_value |
Reallocated Sector Ct |
Number of reallocated sectors. Some space is reserved inside SSDs during the manufacturing. A faulty storage block is isolated and a sound block takes over. The idea behind this is the same as HDD sector reallocation except that reallocation is seldom performed for normal HDDs and frequently on SSDs. For SSDs, the raw value increases with the use time. The SSDs are normal as long as the growth is stable. Raw value = 100 – (100 × Number of replaced blocks/Total number of required blocks). You can estimate the remaining service life of a drive based on this value. |
|
9 |
ssdSmart.smart_9(_raw)_value |
Power On Hours |
Power-on hours of a drive after delivery. The unit is hour, minute, or second set by the manufacturer. You can determine whether a drive has been used based on this value. |
|
12 |
ssdSmart.smart_12(_raw)_value |
Power Cycle Count |
The raw value of this item indicates the number of times a drive is powered on and off. The value is small for a new drive. |
|
170 |
ssdSmart.smart_170(_raw)_value |
Grown Failing Block Count |
Total number of grown blocks that fail to be read or written. |
|
171 |
ssdSmart.smart_171(_raw)_value |
Program_Fail Block Count |
Number of blocks whose flash programming fails. |
|
172 |
ssdSmart.smart_172(_raw)_value |
Erase Fail Block Count |
Number of blocks that fail to be erased. |
|
173 |
ssdSmart.smart_173(_raw)_value |
Wear Leveling Count |
Average number of wear leveling operations performed on all sound blocks. |
|
174 |
ssdSmart.smart_174(_raw)_value |
Unexpected Power Loss Count |
Number of unexpected power loss events after a drive is put into use. |
|
175 |
ssdSmart.smart_175(_raw)_value |
Program Fail Count Chip |
Number of blocks that have programming errors. |
|
177 |
ssdSmart.smart_177(_raw)_value |
Wear Range Delta |
Gap between the wear percentages of most and least worn blocks. |
|
180 |
ssdSmart.smart_180(_raw)_value |
Unused Reserved Block Count Total |
SSDs reserve some space for replacing damaged storage blocks. The raw value of this item indicates the number of reserved storage blocks that are not used. |
|
181 |
ssdSmart.smart_181(_raw)_value |
Program Fail Count |
Number of programming failures displayed in four bytes. |
|
182 |
ssdSmart.smart_182(_raw)_value |
Erase Fail Count |
Number of block erasure failures since a drive is put into use. This value is displayed in four bytes. |
|
183 |
ssdSmart.smart_183(_raw)_value |
SATA Downshift Error Count |
Number of SATA rate downshift errors. Generally, compatibility issues between drives and the mainboard cause the SATA transmission rate to decrease. |
|
184 |
ssdSmart.smart_184(_raw)_value |
Init Bad Block Count |
Number of bad blocks that exist upon delivery. |
|
187 |
ssdSmart.smart_187(_raw)_value |
Reported Uncorrectable Errors |
Number of errors that are reported to the OS and cannot be corrected through hardware ECC. If the raw value is not 0, drive data needs to be backed up. In most cases, it is the same as the number of uncorrectable RAISE errors that are reported to the OS. |
|
188 |
ssdSmart.smart_188(_raw)_value |
Command Timeout |
Number of times that operations are terminated due to drive timeout. Generally, the raw value is 0. If the value is far greater than 0, the possible cause is that the power supply is faulty, the data cable is oxidized, or the drive is faulty. |
|
190 |
ssdSmart.smart_190(_raw)_value |
Airflow Temperature |
Airflow temperature on the surface of a drive platter. |
|
192 |
ssdSmart.smart_192(_raw)_value |
Power-Off Retract Count |
For SSDs, the raw value of this item indicates the number of unsafe power-offs, that is, the number of unexpected power-offs. |
|
194 |
ssdSmart.smart_194(_raw)_value |
Temperature |
The raw value of this item indicates the current temperature inside a drive. |
|
195 |
ssdSmart.smart_195(_raw)_value |
On the fly ECC Uncorrectable Error Count |
Number of uncorrectable errors. |
|
196 |
ssdSmart.smart_196(_raw)_value |
ReallocationEvents Count |
Number of reallocation events. The raw value of this item indicates the accumulated number of attempts to transfer data from reallocated sectors to standby sectors. Both successful and unsuccessful transfer operations are counted. |
|
197 |
ssdSmart.smart_197(_raw)_value |
Current Pending Sector Count |
The raw value of this item indicates the number of pending sectors, that is, the number of sectors to be reallocated. |
|
198 |
ssdSmart.smart_198(_raw)_value |
Offline Uncorrectable Sector Count |
The raw value of this item indicates the total number of uncorrectable errors when data is read from or written to sectors. If the raw value increases, the platter surface medium or mechanical subsystem is faulty, and some sectors cannot be read. If a file is using these sectors, the OS will return a drive read error message. These sectors will be reallocated upon the next write operation. |
|
199 |
ssdSmart.smart_199(_raw)_value |
Ultra ATA CRC Error Rate |
The raw value of this item indicates the accumulated number of data line transmission errors detected by Interface Cyclic Redundancy Check (ICRC). |
|
206 |
ssdSmart.smart_206(_raw)_value |
Soft ECC Correction |
Number of errors corrected by software ECC. |
|
232 |
ssdSmart.smart_232(_raw)_value |
Endurance Remaining |
Percentage of the number of erase operations to the designed maximum number of erase operations. |
|
233 |
ssdSmart.smart_233(_raw)_value |
Available Reserved Space |
Remaining reserved space. |
|
241 |
ssdSmart.smart_241(_raw)_value |
Total LBAs Written |
Total number of written logical block addressing (LBA) blocks. |
|
242 |
ssdSmart.smart_242(_raw)_value |
Total LBAs Read |
Total number of read LBA blocks. |
|
244 |
ssdSmart.smart_244(_raw)_value |
Lifetime Writes from Host |
Total amount of data written by the host to a drive after the drive is put into use. The value is stored in 4 bytes at an increment of 64 GB data. |
|
245 |
ssdSmart.smart_245(_raw)_value |
Lifetime Reads from Host |
Total amount of data read by the host from a drive after the drive is put into use. The value is stored in 4 bytes at an increment of 64 GB data. |
|
disk_sn |
timestamp |
fault |
ssdSmart.smart_1_value |
ssdSmart.smart_1_raw_value |
ssdSmart.smart_5_value |
ssdSmart.smart_5_raw_value |
...... |
|---|---|---|---|---|---|---|---|
|
ZHZ3TFBD |
2021-7-9 |
0 |
84 |
926673 |
100 |
0 |
...... |
|
...... |
...... |
...... |
...... |
...... |
...... |
...... |
...... |
Each SMART item has two values: current value (for example, ssdSmart.smart_1_value) and raw value (for example, ssdSmart.smart_1_raw_value).
- raw_value:
Original value defined by the manufacturer, which is derived from VALUE.
The raw value is the actual value of each parameter during the running of the drive. Most SMART tools display data in decimal format. The meaning of a raw value varies with the parameter as follows:
- The raw value does not directly reflect the drive status, which can be obtained only after the raw value is converted into a normalized value (current value) through the built-in calculation formula of the drive.
- The raw value is accumulated directly. For example, if the value of Start/Stop Count is 50, the drive has started/stopped 50 times since delivery.
- The raw values of some parameters are instant values. For example, if the raw value of Temperature is 44, the current temperature of the drive is 44°C.
Therefore, the raw values of some parameters can directly reflect the current working status of a drive.
- value (current value):
The current value of each SMART item is calculated following a certain formula based on the raw value when the drive is running. The value range is 1 to 253. 253 indicates the best case, and 1 indicates the worst case. The calculation formula is determined by the drive manufacturer.
Before delivery, the manufacturer presets a maximum value for each SMART item, that is, the factory value. The basis and calculation method for determining this value are confidential and vary with the drive model. Generally, the maximum value is 100, 200, or 253. The value of a new drive can be understood to be the preset maximum value (except for some items such as temperature). As the use time and reported errors increase, the current value gradually decreases according to the measured data. Therefore, as the current value approaches the threshold, the life of the drive decreases and the possibility of faults increases. The current value is an indicator for determining the health status of the drive or estimating its service life.
smartctl -a /dev/sda
The first column indicates the serial number of the drive, as shown in Figure 1.
The second column indicates the timestamp.
The third column indicates whether the drive is faulty. (This column is not required for the fault_predict interface.)
The fourth column ssdSmart.smart_1_value corresponds to the value in box 1 in Figure 2.
The fifth column ssdSmart.smart_1_raw_value corresponds to the value in box 2 in Figure 2.
The sixth column ssdSmart.smart_5_value corresponds to the value in box 3 in Figure 2.
The seventh column ssdSmart.smart_5_raw_value corresponds to the value in box 4 in Figure 2.
...
Pay attention to the following during data collection:
- The SMART data implementation varies according to manufacturers. You need to filter SMART data during collection.
For example, the collected SMART data of Seagate Enterprise Capacity 3.5 HDD ST4000NM0035-1V4107 is as follows:

The algorithm needs to collect SMART 188, but the value of this column is 0 0 0. Therefore, the value of this column needs to be filtered. You can retain the first 0 and filter out the last two digits to ensure that the data type is numeric.
- SMART data of some manufacturers may lack required information.
For example, the collected SMART data of HUH728080ALE600 is as follows:

If necessary SMART information is missing, the program reports an error and exits.
For example, if the ssdSmart.smart_1_value column in the input file is missing, the following error message is displayed:
KeyError: "['ssdSmart.smart_1_value'] not in index"
To continue the training, add the missing column to the data file and populate 0s to the column. This method can resume the training but will affect the prediction accuracy of the model.

