准确率计算的代码示例如下。

import numpy as np

# Predicted condition: alert_sn_list, non_alert_sn_list
# Actual condition: test_bad_list, test_good_list
def get_alert_result(alert_sn_list, non_alert_sn_list, test_bad_list, test_good_list):
    def extract_same_element(list1, list2):
        set1 = set(list1)
        set2 = set(list2)
        return list(set1.intersection(set2))

    # true positive,fault disks in alert disks(correct)
    tp_sn_list = extract_same_element(alert_sn_list, test_bad_list)
    tp = len(tp_sn_list)

    # false positive, healthy disks in alert list (wrong)
    fp_sn_list = list(set(alert_sn_list) - set(tp_sn_list))
    fp = len(fp_sn_list)

    # true negative, health disks in non-alert disks (correct)
    tn_sn_list = extract_same_element(non_alert_sn_list, test_good_list)
    tn = len(tn_sn_list)

    # false negative, fault disks in non-alert disks (wrong)
    fn_sn_list = list(set(non_alert_sn_list) - set(tn_sn_list))
    fn = len(fn_sn_list)

    print('confusion_matrix [tn, fp, fn, tp]')
    print([tn, fp, fn, tp])  # confusion matrix

    if (tp + fp) != 0:
        precision = 100 * tp / (tp + fp)
    else:
        precision = np.nan
        print('precision can not be calculated due to tp and fp are both zeroes! ')
    if (tp + fn) != 0:
        recall = 100 * tp / (tp + fn)
    else:
        recall = np.nan
        print('recall can not be calculated due to tp and fn are both zeroes! ')
    if (fp + tn) != 0:
        fpr = 100 * fp / (fp + tn)
    else:
        fpr = np.nan
        print('FPR can not be calculated due to fp and tn are both zeroes! ')
    if (precision + recall) != 0:
        f1 = 2 * (precision * recall) / (precision + recall) / 100  # F1 score
    else:
        f1 = np.nan
        print('F1 score can not be calculated due to precision + recall equal to zero! ')
    if (tp + tn + fp + fn) != 0:
        accuracy = (tp + tn) / (tp + tn + fp + fn)
    else:
        accuracy = np.nan
        print('accuracy can not be calculated due to tp, tn, fp, fn are all zeroes! ')

    print('precision: %f%%', precision)
    print('recall: %f%%', recall)
    print('FPR: %f%%', fpr)
    print('f1 score: %f', f1)
    print('accuracy: %f', accuracy)

关于准确率的输入参数说明如下。

alert_sn_list为检测为故障盘的sn号列表，即fault_predict接口的输出。
non_alert_sn_list为预测为正常盘的sn号列表，所有在输入数据中但没有被fault_predict接口输出的硬盘sn号。
test_bad_list为实际上为故障盘的sn号列表。
test_good_list为实际上为正常盘的sn号列表。

该代码示例将会打印对应的机器学习训练指标的验证结果。

confusion_matrix [tn, fp, fn, tp]、precision、recall（故障预测领域又称FDR）、FPR（故障预测领域又称FAR）、f1 score、accuracy。

图1 实现原理

故障预测领域的FDR（Fault Detection Rate，故障检测率）对应了机器学习领域中的recall（召回率）。FAR（False Alarm Rate，误报率）对应了机器学习领域的FPR（False Positive Rate）