Rate This Document
Findability
Accuracy
Completeness
Readability

Parameter Package

Parameter

Description

Input/Output

Type

Value Range

Default Value

is_diff_feature

Feature engineering parameter, indicating whether to add differential features.

Input

bool

True/False

True

is_diff_roll_feature

Feature engineering parameters, indicating whether to add differential rolling sum features, long-period differential features,

and differential maximum/minimum value features in a period.

Input

bool

True/False

True

is_min_max_feature

Feature engineering parameter, indicating whether to add maximum/minimum raw value features within a period.

Input

bool

True/False

True

long_period

Feature engineering parameter, indicating the period for differential rolling calculations.

Input

int

[1, sys.maxsize]

14

alert_threshold

Fault determination parameter. When the number of SN alarms exceeds the threshold, the drive is considered faulty.

Input

int

[1 ~ ∞)

5

data_tail

Data preprocessing parameter. Only the last data_tail data records of each SN are reserved for training.

Input

int

[1 ~ ∞)

10

model

Source data parameter. SSD SMART data varies with the manufacturer. You need to specify the manufacturer to determine SMART columns to be used.

Input

String

['DEFAULT', 'MA1', 'MA2', 'MB1', 'MB2', 'MC1', 'MC2', ]

'DEFAULT'

log_file

Log file path.

Input

String

-

/var/log/smartmaintainkit.log

  1. In Python, sys.maxsize indicates the maximum value of the int type.
  2. The is_diff_roll_sum_feature parameter takes effect only when is_diff_feature is set to True. If is_diff_feature is set to False, is_diff_roll_sum_feature is meaningless.

    Examples:

    1. Call an interface in default mode.
      fault_train(r'fault_test.data', r'model.pkl')
    2. Change the log file path to temp.log.
      param = {'log_file': 'temp.log'} 
      fault_train(r'fault_test.data', r'model.pkl', param)
Table 1 Description of the model parameter

SMART ID

MA1

MA2

MB1

MB2

MC1

MC2

1

-

-

-

5

9

12

170

-

-

-

171

-

-

172

-

-

173

-

-

-

174

-

-

175

-

-

-

-

177

-

-

-

-

180

-

181

-

-

-

-

182

-

-

-

-

183

  

184

187

188

-

-

-

190

-

-

192

-

-

-

-

-

194

-

195

-

196

-

-

-

197

-

-

198

-

-

-

199

206

-

-

-

-

232

-

-

-

-

-

233

-

-

-

-

-

241

-

-

-

242

-

-

-

244

-

-

-

-

245

-

-

-

-

Remarks

: SMART ID that needs to be contained in the dataset required by the model.

The DEFAULT parameter uses the collection of all SMART IDs contained in each model in the preceding table.

The following provides a method for quickly detecting the data volume in a dataset for each model:

import pandas as pd
smart_id_dict = {
    'MA1': [1, 5, 9, 12, 171, 172, 173, 174, 175, 180, 184, 187, 188, 190, 194, 195, 196, 197, 198, 199],  # MA1
    'MA2': [5, 9, 12, 170, 171, 172, 174, 183, 184, 187, 190, 192, 194, 197, 199, 233, 241, 242, 175, 232],  # MA2
    'MB1': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 194, 195, 197, 199, 241, 242, 244, 245],  # MB1
    'MB2': [5, 9, 12, 177, 180, 181, 182, 183, 184, 187, 190, 195, 197, 199, 241, 242, 244, 245],  # MB2
    'MC1': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206],  # MC1
    'MC2': [1, 5, 9, 12, 170, 171, 172, 173, 174, 180, 183, 184, 187, 188, 194, 195, 196, 198, 199, 206]  # MC2
}
smart_id_dict['DEFAULT'] = list(set(smart_id_dict['MA1']).intersection(
    smart_id_dict['MA2'], smart_id_dict['MB1'], smart_id_dict['MB2'], smart_id_dict['MC1'], smart_id_dict['MC2']))
data = pd.read_csv(r'ssd_fault_data.csv', low_memory=False)  # Enter the path of the data file to be checked.
for key, value in smart_id_dict.items():
    feature_smart_col = [f'ssdSmart.smart_{value}_value' for value in value] + \
                        [f'ssdSmart.smart_{value}_raw_value' for value in value]
    feature_col = ['disk_sn', 'timestamp', 'fault'] + feature_smart_col

    try:
        count = data[feature_col].dropna(axis=0, subset=feature_col).shape[0]
    except KeyError:
        count = 0

    print(key + ': ' + str(count))

Data volume in the ssd_fault_data.csv dataset for each model is output as follows:

MA1: 15068
MA2: 80597
MB1: 39963
MB2: 83407
MC1: 191758
MC2: 191758
DEFAULT: 370840

Recommended Parameters

MA1

param = {'model': 'MA1'}
fault_train(r'ssd_MA1_fault_train.csv', r'model.pkl', param)

MA2

param = {'model': 'MA2'}
fault_train(r'ssd_MA2_fault_train.csv', r'model.pkl', param)

MB1

param = {'model': 'MB1'}
fault_train(r'ssd_MB1_fault_train.csv', r'model.pkl', param)

MB2

param = {'model': 'MB2'}
fault_train(r'ssd_MB2_fault_train.csv', r'model.pkl', param)

MC1

param ={'model': 'MC1', 'long_period': 100}
fault_train(r'ssd_MC1_fault_train.csv', r'model.pkl', param)

MC2

param = {'model': 'MC2', 'long_period': 100}
fault_train(r'ssd_MC2_fault_train.csv', r'model.pkl', param)