Parameter Package

Parameter	Description	Input/Output	Type	Value Range	Default Value
is_diff_feature	Feature engineering parameter, indicating whether to add differential features.	Input	bool	True/False	True
is_diff_roll_feature	Feature engineering parameters, indicating whether to add differential rolling sum features, long-period differential features, and differential maximum/minimum value features in a period.	Input	bool	True/False	True
is_min_max_feature	Feature engineering parameter, indicating whether to add maximum/minimum raw value features within a period.	Input	bool	True/False	True
long_period	Feature engineering parameter, indicating the period for differential rolling calculations.	Input	int	[1, sys.maxsize]	14
alert_threshold	Fault determination parameter. When the number of SN alarms exceeds the threshold, the drive is considered faulty.	Input	int	[1 ~ ∞)	5
data_tail	Data preprocessing parameter. Only the last data_tail data records of each SN are reserved for training.	Input	int	[1 ~ ∞)	10
model	Source data parameter. SSD SMART data varies with the manufacturer. You need to specify the manufacturer to determine SMART columns to be used.	Input	String	['DEFAULT', 'MA1', 'MA2', 'MB1', 'MB2', 'MC1', 'MC2', ]	'DEFAULT'
log_file	Log file path.	Input	String	-	/var/log/smartmaintainkit.log

In Python, sys.maxsize indicates the maximum value of the int type.
The is_diff_roll_sum_feature parameter takes effect only when is_diff_feature is set to True. If is_diff_feature is set to False, is_diff_roll_sum_feature is meaningless.
Examples:
1. Call an interface in default mode.
```
fault_train(r'fault_test.data', r'model.pkl')
```
2. Change the log file path to temp.log.
```
param = {'log_file': 'temp.log'} 
fault_train(r'fault_test.data', r'model.pkl', param)
```

**Table 1** Description of the model parameter
SMART ID	MA1	MA2	MB1	MB2	MC1	MC2
1	√	-	-	-	√	√
5	√	√	√	√	√	√
9	√	√	√	√	√	√
12	√	√	√	√	√	√
170	-	√	-	-	√	√
171	√	√	-	-	√	√
172	√	√	-	-	√	√
173	√	-	-	-	√	√
174	√	√	-	-	√	√
175	√	√	-	-	-	-
177	-	-	√	√	-	-
180	√	-	√	√	√	√
181	-	-	√	√	-	-
182	-	-	√	√	-	-
183		√	√	√	√	√
184	√	√	√	√	√	√
187	√	√	√	√	√	√
188	√	-	-	-	√	√
190	√	√	√	√	-	-
192	-	√	-	-	-	-
194	√	√	√	-	√	√
195	√	-	√	√	√	√
196	√	-	-	-	√	√
197	√	√	√	√	-	-
198	√	-	-	-	√	√
199	√	√	√	√	√	√
206	-	-	-	-	√	√
232	-	√	-	-	-	-
233	-	√	-	-	-	-
241	-	√	√	√	-	-
242	-	√	√	√	-	-
244	-	-	√	√	-	-
245	-	-	√	√	-	-
Remarks	√: SMART ID that needs to be contained in the dataset required by the model.

The DEFAULT parameter uses the collection of all SMART IDs contained in each model in the preceding table.

The following provides a method for quickly detecting the data volume in a dataset for each model:

import pandas as pd
smart_id_dict = {
    'MA1': [1, 5, 9, 12, 171, 172, 173, 174, 175, 180, 184, 187, 188, 190, 194, 195, 196, 197, 198, 199],  # MA1
    'MA2': [5, 9, 12, 170, 171, 172, 174, 183, 184, 187, 190, 192, 194, 197, 199, 233, 241, 242, 175, 232],  # MA2
    'MB1': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 194, 195, 197, 199, 241, 242, 244, 245],  # MB1
    'MB2': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 195, 197, 199, 241, 242, 244, 245],  # MB2
    'MC1': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206],  # MC1
    'MC2': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206]  # MC2
}
smart_id_dict['DEFAULT'] = list(set(smart_id_dict['MA1']).intersection(
    smart_id_dict['MA2'], smart_id_dict['MB1'], smart_id_dict['MB2'], smart_id_dict['MC1'], smart_id_dict['MC2']))
data = pd.read_csv(r'ssd_fault_data.csv', low_memory=False)  # Enter the path of the data file to be checked.
for key, value in smart_id_dict.items():
    feature_smart_col = [f'ssdSmart.smart_{value}_value' for value in value] + \
                        [f'ssdSmart.smart_{value}_raw_value' for value in value]
    feature_col = ['disk_sn', 'timestamp', 'fault'] + feature_smart_col

    try:
        count = data[feature_col].dropna(axis=0, subset=feature_col).shape[0]
    except KeyError:
        count = 0

    print(key + ': ' + str(count))

Data volume in the ssd_fault_data.csv dataset for each model is output as follows:

MA1: 15068
MA2: 80597
MB1: 39963
MB2: 83407
MC1: 191758
MC2: 191758
DEFAULT: 370840

Recommended Parameters

MA1

param = {'model': 'MA1'}
fault_train(r'ssd_MA1_fault_train.csv', r'model.pkl', param)

MA2

param = {'model': 'MA2'}
fault_train(r'ssd_MA2_fault_train.csv', r'model.pkl', param)

MB1

param = {'model': 'MB1'}
fault_train(r'ssd_MB1_fault_train.csv', r'model.pkl', param)

MB2

param = {'model': 'MB2'}
fault_train(r'ssd_MB2_fault_train.csv', r'model.pkl', param)

MC1

param ={'model': 'MC1', 'long_period': 100}
fault_train(r'ssd_MC1_fault_train.csv', r'model.pkl', param)

MC2

param = {'model': 'MC2', 'long_period': 100}
fault_train(r'ssd_MC2_fault_train.csv', r'model.pkl', param)

Parent topic: Interfaces and Parameters