Parameter Package
Parameter |
Description |
Input/Output |
Type |
Value Range |
Default Value |
|---|---|---|---|---|---|
is_diff_feature |
Feature engineering parameter, indicating whether to add differential features. |
Input |
bool |
True/False |
True |
is_diff_roll_feature |
Feature engineering parameters, indicating whether to add differential rolling sum features, long-period differential features, and differential maximum/minimum value features in a period. |
Input |
bool |
True/False |
True |
is_min_max_feature |
Feature engineering parameter, indicating whether to add maximum/minimum raw value features within a period. |
Input |
bool |
True/False |
True |
long_period |
Feature engineering parameter, indicating the period for differential rolling calculations. |
Input |
int |
[1, sys.maxsize] |
14 |
alert_threshold |
Fault determination parameter. When the number of SN alarms exceeds the threshold, the drive is considered faulty. |
Input |
int |
[1 ~ ∞) |
5 |
data_tail |
Data preprocessing parameter. Only the last data_tail data records of each SN are reserved for training. |
Input |
int |
[1 ~ ∞) |
10 |
model |
Source data parameter. SSD SMART data varies with the manufacturer. You need to specify the manufacturer to determine SMART columns to be used. |
Input |
String |
['DEFAULT', 'MA1', 'MA2', 'MB1', 'MB2', 'MC1', 'MC2', ] |
'DEFAULT' |
log_file |
Log file path. |
Input |
String |
- |
/var/log/smartmaintainkit.log |
- In Python, sys.maxsize indicates the maximum value of the int type.
- The is_diff_roll_sum_feature parameter takes effect only when is_diff_feature is set to True. If is_diff_feature is set to False, is_diff_roll_sum_feature is meaningless.
Examples:
- Call an interface in default mode.
fault_train(r'fault_test.data', r'model.pkl')
- Change the log file path to temp.log.
param = {'log_file': 'temp.log'} fault_train(r'fault_test.data', r'model.pkl', param)
- Call an interface in default mode.
SMART ID |
MA1 |
MA2 |
MB1 |
MB2 |
MC1 |
MC2 |
|---|---|---|---|---|---|---|
1 |
√ |
- |
- |
- |
√ |
√ |
5 |
√ |
√ |
√ |
√ |
√ |
√ |
9 |
√ |
√ |
√ |
√ |
√ |
√ |
12 |
√ |
√ |
√ |
√ |
√ |
√ |
170 |
- |
√ |
- |
- |
√ |
√ |
171 |
√ |
√ |
- |
- |
√ |
√ |
172 |
√ |
√ |
- |
- |
√ |
√ |
173 |
√ |
- |
- |
- |
√ |
√ |
174 |
√ |
√ |
- |
- |
√ |
√ |
175 |
√ |
√ |
- |
- |
- |
- |
177 |
- |
- |
√ |
√ |
- |
- |
180 |
√ |
- |
√ |
√ |
√ |
√ |
181 |
- |
- |
√ |
√ |
- |
- |
182 |
- |
- |
√ |
√ |
- |
- |
183 |
√ |
√ |
√ |
√ |
√ |
|
184 |
√ |
√ |
√ |
√ |
√ |
√ |
187 |
√ |
√ |
√ |
√ |
√ |
√ |
188 |
√ |
- |
- |
- |
√ |
√ |
190 |
√ |
√ |
√ |
√ |
- |
- |
192 |
- |
√ |
- |
- |
- |
- |
194 |
√ |
√ |
√ |
- |
√ |
√ |
195 |
√ |
- |
√ |
√ |
√ |
√ |
196 |
√ |
- |
- |
- |
√ |
√ |
197 |
√ |
√ |
√ |
√ |
- |
- |
198 |
√ |
- |
- |
- |
√ |
√ |
199 |
√ |
√ |
√ |
√ |
√ |
√ |
206 |
- |
- |
- |
- |
√ |
√ |
232 |
- |
√ |
- |
- |
- |
- |
233 |
- |
√ |
- |
- |
- |
- |
241 |
- |
√ |
√ |
√ |
- |
- |
242 |
- |
√ |
√ |
√ |
- |
- |
244 |
- |
- |
√ |
√ |
- |
- |
245 |
- |
- |
√ |
√ |
- |
- |
Remarks |
√: SMART ID that needs to be contained in the dataset required by the model. |
|||||
The DEFAULT parameter uses the collection of all SMART IDs contained in each model in the preceding table.
The following provides a method for quickly detecting the data volume in a dataset for each model:
import pandas as pd
smart_id_dict = {
'MA1': [1, 5, 9, 12, 171, 172, 173, 174, 175, 180, 184, 187, 188, 190, 194, 195, 196, 197, 198, 199], # MA1
'MA2': [5, 9, 12, 170, 171, 172, 174, 183, 184, 187, 190, 192, 194, 197, 199, 233, 241, 242, 175, 232], # MA2
'MB1': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 194, 195, 197, 199, 241, 242, 244, 245], # MB1
'MB2': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 195, 197, 199, 241, 242, 244, 245], # MB2
'MC1': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206], # MC1
'MC2': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206] # MC2
}
smart_id_dict['DEFAULT'] = list(set(smart_id_dict['MA1']).intersection(
smart_id_dict['MA2'], smart_id_dict['MB1'], smart_id_dict['MB2'], smart_id_dict['MC1'], smart_id_dict['MC2']))
data = pd.read_csv(r'ssd_fault_data.csv', low_memory=False) # Enter the path of the data file to be checked.
for key, value in smart_id_dict.items():
feature_smart_col = [f'ssdSmart.smart_{value}_value' for value in value] + \
[f'ssdSmart.smart_{value}_raw_value' for value in value]
feature_col = ['disk_sn', 'timestamp', 'fault'] + feature_smart_col
try:
count = data[feature_col].dropna(axis=0, subset=feature_col).shape[0]
except KeyError:
count = 0
print(key + ': ' + str(count))
Data volume in the ssd_fault_data.csv dataset for each model is output as follows:
MA1: 15068 MA2: 80597 MB1: 39963 MB2: 83407 MC1: 191758 MC2: 191758 DEFAULT: 370840
Recommended Parameters
MA1
param = {'model': 'MA1'}
fault_train(r'ssd_MA1_fault_train.csv', r'model.pkl', param)
MA2
param = {'model': 'MA2'}
fault_train(r'ssd_MA2_fault_train.csv', r'model.pkl', param)
MB1
param = {'model': 'MB1'}
fault_train(r'ssd_MB1_fault_train.csv', r'model.pkl', param)
MB2
param = {'model': 'MB2'}
fault_train(r'ssd_MB2_fault_train.csv', r'model.pkl', param)
MC1
param ={'model': 'MC1', 'long_period': 100}
fault_train(r'ssd_MC1_fault_train.csv', r'model.pkl', param)
MC2
param = {'model': 'MC2', 'long_period': 100}
fault_train(r'ssd_MC2_fault_train.csv', r'model.pkl', param)