Input File Format
The smartctl CLI collects the current SMART data of drives and generates a CSV file in the specified format.
After filtering common SMART data used in the industry, the following SMART data are extracted as the input data for random forest model training and prediction:
|
SMART ID |
CSV File Column Name |
Attribute Name |
Remarks |
|---|---|---|---|
|
1 |
hddSmart.smart_1(_raw)_value |
Raw_Read_Error_Rate |
Records the rate of errors when data is read from a drive platter. The raw value varies from manufacturer to manufacturer because of different calculation methods, and the decimal value is generally meaningless. |
|
5 |
hddSmart.smart_5(_raw)_value |
Reallocated_Sector_Ct |
Records the number of bad sectors that are reallocated to good standby sectors. When a drive has bad sectors, the physical space of these sectors can be reallocated to good sectors for remapping, so that the drive can still be used. However, after the number of bad sectors reaches a certain value, the standby sectors will be used up and remapping cannot be performed. Any more bad sectors are still displayed and cannot be rectified. |
|
7 |
hddSmart.smart_7(_raw)_value |
Seek_Error_Rate |
Records the rate of errors when the head fails to seek tracks due to mechanical problems. Possible causes include head servo component faults, drive overheating, or drive damage. The raw value varies from manufacturer to manufacturer because of different calculation methods, and the decimal value is generally meaningless. |
|
193 |
hddSmart.smart_193(_raw)_value |
Load_Cycle_Count |
Records the number of cycles that the head move between the landing zone and platter. |
|
197 |
hddSmart.smart_197(_raw)_value |
Current_Pending_Sector |
Records the number of pending sectors. |
|
198 |
hddSmart.smart_198(_raw)_value |
Offline_Uncorrectable |
Records the number of uncorrectable sectors. |
|
199 |
hddSmart.smart_199(_raw)_value |
UDMA_CRC_Error_Count |
Records the number of cyclic redundancy check errors that occur during drive communication. |
|
188 |
hddSmart.smart_188(_raw)_value |
Command_Timeout |
Records the number of times that the connection to the drive times out. Generally, the value is 0. If the value is far greater than 0, the power supply may be faulty, the data cable port may be oxidized, or more serious faults have occurred. |
|
187 |
hddSmart.smart_187(_raw)_value |
Reported_Uncorrect |
Records the number of errors that cannot be corrected by hardware error checking and correction. |
Each SMART item has two values: current value (for example, hddSmart.smart_1_value) and raw value (for example, hddSmart.smart_1_raw_value).
- raw_value:
Original value defined by the manufacturer, which is derived from VALUE.
The raw value is the actual value of each parameter during the running of the drive. Most SMART tools display data in decimal format. The meaning of a raw value varies with the parameter as follows:
- The raw value does not directly reflect the drive status, which can be obtained only after the raw value is converted into a normalized value (current value) through the built-in calculation formula of the drive.
- The raw value is accumulated directly. For example, if the value of Start/Stop Count is 50, the drive has started/stopped 50 times since delivery.
- The raw values of some parameters are instant values. For example, if the raw value of Temperature is 44, the current temperature of the drive is 44°C.
Therefore, the raw values of some parameters can directly reflect the current working status of a drive.
- value (current value):
The current value of each SMART item is calculated following a certain formula based on the raw value when the drive is running. The value range is 1 to 253. 253 indicates the best case, and 1 indicates the worst case. The calculation formula is determined by the drive manufacturer.
Before delivery, the manufacturer presets a maximum value for each SMART item, that is, the factory value. The basis and calculation method for determining this value are confidential and vary with the drive model. Generally, the maximum value is 100, 200, or 253. The value of a new drive can be understood to be the preset maximum value (except for some items such as temperature). As the use time and reported errors increase, the current value gradually decreases according to the measured data. Therefore, as the current value approaches the threshold, the life of the drive decreases and the possibility of faults increases. The current value is an indicator for determining the health status of the drive or estimating its service life.
|
disk_sn |
timestamp |
fault |
hddSmart.smart_1_value |
hddSmart.smart_1_raw_value |
hddSmart.smart_5_value |
hddSmart.smart_5_raw_value |
... |
|---|---|---|---|---|---|---|---|
|
ZHZ3TFBD |
2021-7-9 |
0 |
84 |
926673 |
100 |
0 |
... |
|
... |
... |
... |
... |
... |
... |
... |
... |
smartctl -a /dev/sda
The first column indicates the serial number of the drive, as shown in figure 1.
The second column indicates the timestamp.
The third column indicates whether the drive is faulty. (This column is not required for the fault_predict interface.)
The fourth column hddSmart.smart_1_value corresponds to the value in box 1 in figure 2.
The fifth column hddSmart.smart_1_raw_value corresponds to the value in box 2 in figure 2.
The sixth column hddSmart.smart_5_value corresponds to the value in box 3 in figure 2.
The seventh column hddSmart.smart_5_raw_value corresponds to the value in box 4 in figure 2.
...
Pay attention to the following during data collection:
- The SMART data implementation varies according to manufacturers. You need to filter SMART data during collection.
For example, the following figure shows the collected SMART data of Seagate Enterprise Capacity 3.5 HDD ST4000NM0035-1V4107.
The algorithm needs to collect SMART 188, but the value of this column is 0 0 0. Therefore, the value of this column needs to be filtered. You can retain the first 0 and filter out the last two digits to ensure that the data type is numeric.
- SMART data of some manufacturers may lack required information.
For example, the following figure shows the collected SMART data of HUH728080ALE600.
If necessary SMART information is missing, the program reports an error and exits.
For example, if the hddSmart.smart_1_value column in the input file is missing, the following error message is displayed:
KeyError: "['hddSmart.smart_1_value'] not in index"
To continue the training, add the missing column to the data file and populate 0s to the column. This method can resume the training but will affect the prediction accuracy of the model.

