Dimitriadis et al. propose several tests based on comparing actual predictions to predictions when the probabilities are calibrated. This yields several possible tests of correctly calibrated predictions (i.e. that the expected proportion of true values matches the predicted probability).
brier_resampling_test(x, y, alpha = 0.05, B = 10000)
brier_resampling_p(x, y, B = 10000)
binary_miscalibration(x, y)
miscalibration_resampling_p(x, y, B = 10000)
miscalibration_resampling_test(x, y, alpha = 0.05, B = 10000)
the predicted success probabilities
the actual observed outcomes (just 0 or 1)
the type I error rate for the test
number of boostrap samples for the null distribution
brier_resampling_test
and miscalibration_resampling_test
return
an object of class htest
, brier_resampling_p
and miscalibration_resampling_p
return just the p-value (for easier use with automated workflows).
binary_miscalibration
computes just the miscalibration component using
the PAV (pool adjacent violators) algorithm.
The brier_
functions represent a test based on brier score, while
the miscalibration_
functions represent a test based on miscalibration.
In both cases we evaluate the null distribution via bootstrapping.