Dimitriadis et al. propose several tests based on comparing actual predictions to predictions when the probabilities are calibrated. This yields several possible tests of correctly calibrated predictions (i.e. that the expected proportion of true values matches the predicted probability).
brier_resampling_test(x, y, alpha = 0.05, B = 10000)
brier_resampling_p(x, y, B = 10000)
binary_miscalibration(x, y)
miscalibration_resampling_p(x, y, B = 10000)
miscalibration_resampling_test(x, y, alpha = 0.05, B = 10000)the predicted success probabilities
the actual observed outcomes (just 0 or 1)
the type I error rate for the test
number of boostrap samples for the null distribution
brier_resampling_test and miscalibration_resampling_test return
an object of class htest, brier_resampling_p and miscalibration_resampling_p
return just the p-value (for easier use with automated workflows).
binary_miscalibration computes just the miscalibration component using
the PAV (pool adjacent violators) algorithm.
The brier_ functions represent a test based on brier score, while
the miscalibration_ functions represent a test based on miscalibration.
In both cases we evaluate the null distribution via bootstrapping.