Accuracy Calculation

The following provides sample code for accuracy calculation:

import numpy as np

# Predicted condition: alert_sn_list, non_alert_sn_list
# Actual condition: test_bad_list, test_good_list
def get_alert_result(alert_sn_list, non_alert_sn_list, test_bad_list, test_good_list):
    def extract_same_element(list1, list2):
        set1 = set(list1)
        set2 = set(list2)
        return list(set1.intersection(set2))

    # true positive,fault disks in alert disks(correct)
    tp_sn_list = extract_same_element(alert_sn_list, test_bad_list)
    tp = len(tp_sn_list)

    # false positive, healthy disks in alert list (wrong)
    fp_sn_list = list(set(alert_sn_list) - set(tp_sn_list))
    fp = len(fp_sn_list)

    # true negative, health disks in non-alert disks (correct)
    tn_sn_list = extract_same_element(non_alert_sn_list, test_good_list)
    tn = len(tn_sn_list)

    # false negative, fault disks in non-alert disks (wrong)
    fn_sn_list = list(set(non_alert_sn_list) - set(tn_sn_list))
    fn = len(fn_sn_list)

    print('confusion_matrix [tn, fp, fn, tp]')
    print([tn, fp, fn, tp])  # confusion matrix

    if (tp + fp) != 0:
        precision = 100 * tp / (tp + fp)
    else:
        precision = np.nan
        print('precision can not be calculated due to tp and fp are both zeroes! ')
    if (tp + fn) != 0:
        recall = 100 * tp / (tp + fn)
    else:
        recall = np.nan
        print('recall can not be calculated due to tp and fn are both zeroes! ')
    if (fp + tn) != 0:
        fpr = 100 * fp / (fp + tn)
    else:
        fpr = np.nan
        print('FPR can not be calculated due to fp and tn are both zeroes! ')
    if (precision + recall) != 0:
        f1 = 2 * (precision * recall) / (precision + recall) / 100  # F1 score
    else:
        f1 = np.nan
        print('F1 score can not be calculated due to precision + recall equal to zero! ')
    if (tp + tn + fp + fn) != 0:
        accuracy = (tp + tn) / (tp + tn + fp + fn)
    else:
        accuracy = np.nan
        print('accuracy can not be calculated due to tp, tn, fp, fn are all zeroes! ')

    print('precision: %f%%', precision)
    print('recall: %f%%', recall)
    print('FPR: %f%%', fpr)
    print('f1 score: %f', f1)
    print('accuracy: %f', accuracy)

Accuracy-related input parameters are described as follows:

alert_sn_list indicates the SN list of drives that are predicted to be faulty, that is, the output of the fault_predict interface.
non_alert_sn_list indicates the SN list of drives that are predicted to be normal, that is, the SNs of drives contained in the input data but not output by the fault_predict interface.
test_bad_list indicates the list of real faulty drive SNs.
test_good_list indicates the list of real normal drive SNs.

The sample code prints the verification results of the following machine learning metrics:

confusion_matrix [tn, fp, fn, tp], precision, recall (also called FDR in the fault prediction field), FPR (also called FAR in the fault prediction field), f1 score, and accuracy

Figure 1 Working principle

Fault detection rate (FDR) in the fault prediction field corresponds to recall in the machine learning field. False alarm rate (FAR) in the fault prediction field corresponds to false positive rate (FPR) in the machine learning field.

Parent topic: HDD Fault Prediction