准确率计算
准确率计算的代码示例如下。
import numpy as np # Predicted condition: alert_sn_list, non_alert_sn_list # Actual condition: test_bad_list, test_good_list def get_alert_result(alert_sn_list, non_alert_sn_list, test_bad_list, test_good_list): def extract_same_element(list1, list2): set1 = set(list1) set2 = set(list2) return list(set1.intersection(set2)) # true positive,fault disks in alert disks(correct) tp_sn_list = extract_same_element(alert_sn_list, test_bad_list) tp = len(tp_sn_list) # false positive, healthy disks in alert list (wrong) fp_sn_list = list(set(alert_sn_list) - set(tp_sn_list)) fp = len(fp_sn_list) # true negative, health disks in non-alert disks (correct) tn_sn_list = extract_same_element(non_alert_sn_list, test_good_list) tn = len(tn_sn_list) # false negative, fault disks in non-alert disks (wrong) fn_sn_list = list(set(non_alert_sn_list) - set(tn_sn_list)) fn = len(fn_sn_list) print('confusion_matrix [tn, fp, fn, tp]') print([tn, fp, fn, tp]) # confusion matrix if (tp + fp) != 0: precision = 100 * tp / (tp + fp) else: precision = np.nan print('precision can not be calculated due to tp and fp are both zeroes! ') if (tp + fn) != 0: recall = 100 * tp / (tp + fn) else: recall = np.nan print('recall can not be calculated due to tp and fn are both zeroes! ') if (fp + tn) != 0: fpr = 100 * fp / (fp + tn) else: fpr = np.nan print('FPR can not be calculated due to fp and tn are both zeroes! ') if (precision + recall) != 0: f1 = 2 * (precision * recall) / (precision + recall) / 100 # F1 score else: f1 = np.nan print('F1 score can not be calculated due to precision + recall equal to zero! ') if (tp + tn + fp + fn) != 0: accuracy = (tp + tn) / (tp + tn + fp + fn) else: accuracy = np.nan print('accuracy can not be calculated due to tp, tn, fp, fn are all zeroes! ') print('precision: %f%%', precision) print('recall: %f%%', recall) print('FPR: %f%%', fpr) print('f1 score: %f', f1) print('accuracy: %f', accuracy)
关于准确率的输入参数说明如下。
- alert_sn_list为检测为故障盘的sn号列表,即fault_predict接口的输出。
- non_alert_sn_list为预测为正常盘的sn号列表,所有在输入数据中但没有被fault_predict接口输出的硬盘sn号。
- test_bad_list为实际上为故障盘的sn号列表。
- test_good_list为实际上为正常盘的sn号列表。
该代码示例将会打印对应的机器学习训练指标的验证结果。
confusion_matrix [tn, fp, fn, tp]、precision、recall(故障预测领域又称FDR)、FPR(故障预测领域又称FAR)、f1 score、accuracy。
图1 实现原理
故障预测领域的FDR(Fault Detection Rate,故障检测率)对应了机器学习领域中的recall(召回率)。FAR(False Alarm Rate,误报率)对应了机器学习领域的FPR(False Positive Rate)