准确率计算
准确率计算的代码示例如下。
import numpy as np
# Predicted condition: alert_sn_list, non_alert_sn_list
# Actual condition: test_bad_list, test_good_list
def get_alert_result(alert_sn_list, non_alert_sn_list, test_bad_list, test_good_list):
def extract_same_element(list1, list2):
set1 = set(list1)
set2 = set(list2)
return list(set1.intersection(set2))
# true positive,fault disks in alert disks(correct)
tp_sn_list = extract_same_element(alert_sn_list, test_bad_list)
tp = len(tp_sn_list)
# false positive, healthy disks in alert list (wrong)
fp_sn_list = list(set(alert_sn_list) - set(tp_sn_list))
fp = len(fp_sn_list)
# true negative, health disks in non-alert disks (correct)
tn_sn_list = extract_same_element(non_alert_sn_list, test_good_list)
tn = len(tn_sn_list)
# false negative, fault disks in non-alert disks (wrong)
fn_sn_list = list(set(non_alert_sn_list) - set(tn_sn_list))
fn = len(fn_sn_list)
print('confusion_matrix [tn, fp, fn, tp]')
print([tn, fp, fn, tp]) # confusion matrix
if (tp + fp) != 0:
precision = 100 * tp / (tp + fp)
else:
precision = np.nan
print('precision can not be calculated due to tp and fp are both zeroes! ')
if (tp + fn) != 0:
recall = 100 * tp / (tp + fn)
else:
recall = np.nan
print('recall can not be calculated due to tp and fn are both zeroes! ')
if (fp + tn) != 0:
fpr = 100 * fp / (fp + tn)
else:
fpr = np.nan
print('FPR can not be calculated due to fp and tn are both zeroes! ')
if (precision + recall) != 0:
f1 = 2 * (precision * recall) / (precision + recall) / 100 # F1 score
else:
f1 = np.nan
print('F1 score can not be calculated due to precision + recall equal to zero! ')
if (tp + tn + fp + fn) != 0:
accuracy = (tp + tn) / (tp + tn + fp + fn)
else:
accuracy = np.nan
print('accuracy can not be calculated due to tp, tn, fp, fn are all zeroes! ')
print('precision: %f%%', precision)
print('recall: %f%%', recall)
print('FPR: %f%%', fpr)
print('f1 score: %f', f1)
print('accuracy: %f', accuracy)
关于准确率的输入参数说明如下。
- alert_sn_list为检测为故障盘的sn号列表,即fault_predict接口的输出。
- non_alert_sn_list为预测为正常盘的sn号列表,所有在输入数据中但没有被fault_predict接口输出的硬盘sn号。
- test_bad_list为实际上为故障盘的sn号列表。
- test_good_list为实际上为正常盘的sn号列表。
该代码示例将会打印对应的机器学习训练指标的验证结果。
confusion_matrix [tn, fp, fn, tp]、precision、recall(故障预测领域又称FDR)、FPR(故障预测领域又称FAR)、f1 score、accuracy。
图1 实现原理
故障预测领域的FDR(Fault Detection Rate,故障检测率)对应了机器学习领域中的recall(召回率)。FAR(False Alarm Rate,误报率)对应了机器学习领域的FPR(False Positive Rate)
父主题: HDD故障预测