Robust monitoring machine: a machine learning solution for out-of-sample R $$^2$$ -hacking in return predictability monitoring

Financial Innovation

Table 2 Monitoring performance measures

	Estimate	95% C.I.	P-value
Panel B: classification performance tests
TPR + TNR	1.14	(1.07, 1.21)
PPV + NPV	1.14	(1.07, 1.21)
Fisher’s exact test			$6.84\times 10^{-5}$
Chi-square test			$5.29\times 10^{-5}$

This table evaluates the performance of out-of-sample binary classification (without look-ahead bias) by robust monitoring machine for the evaluation period from 01/1947 to 01/2017. Panel A shows a confusion matrix based on predicted binary outcomes: positive (the proposed forecast outperforms the benchmark) or negative (otherwise). Sensitivity, or True Positive Rate (TPR), is calculated as the number of true positives divided by the number of real positives. Specificity, or True Negative Rate (TNR), is computed as the number of true negatives over the number of real negatives. PPV stands for Positive Predicted Value, calculated as the number of true positives divided by the number of predicted positives. NPV, or Negative Predictive Value, is computed as the number of true negatives divided by the number of predicted negatives. Accuracy is equal to the number of correctly predicted as a fraction of the total number of the sample observations. Panel B reports the 95% confident intervals for ‘Sensitivity + Specificity’ and ‘PPV + NPV’