Performs the main SBC routine given datasets and a backend.
compute_SBC(
datasets,
backend,
cores_per_fit = default_cores_per_fit(length(datasets)),
keep_fits = TRUE,
thin_ranks = SBC_backend_default_thin_ranks(backend),
ensure_num_ranks_divisor = 2,
chunk_size = default_chunk_size(length(datasets)),
dquants = NULL,
cache_mode = "none",
cache_location = NULL,
globals = list(),
gen_quants = NULL
)
an object of class SBC_datasets
the model + sampling algorithm. The built-in backends can be constructed
using SBC_backend_cmdstan_sample()
, SBC_backend_cmdstan_variational()
,
SBC_backend_rstan_sample()
, SBC_backend_rstan_optimizing()
and SBC_backend_brms()
.
(more to come: issue 31, 38, 39). The backend is an S3 class supporting at least the SBC_fit()
,
SBC_fit_to_draws_matrix()
methods.
how many cores should the backend be allowed to use for a single fit?
Defaults to the maximum number that does not produce more parallel chains
than you have cores. See default_cores_per_fit()
.
boolean, when FALSE
full fits are discarded from memory -
reduces memory consumption and increases speed (when processing in parallel), but
prevents you from inspecting the fits and using recompute_SBC_statistics()
.
We recommend to set to TRUE
in early phases of workflow, when you run just a few fits.
Once the model is stable and you want to run a lot of iterations, we recommend setting
to FALSE
(even for quite a simple model, 1000 fits can easily exhaust 32GB of RAM).
how much thinning should be applied to posterior draws before computing ranks for SBC. Should be large enough to avoid any noticeable autocorrelation of the thinned draws See details below.
Potentially drop some posterior samples to ensure that this number divides the total number of SBC ranks (see Details).
How many simulations within the datasets
shall be processed in one batch
by the same worker. Relevant only when using parallel processing.
The larger the value, the smaller overhead there will be for parallel processing, but
the work may be distributed less equally across workers. We recommend setting this high
enough that a single batch takes at least several seconds, i.e. for small models,
you can often reduce computation time noticeably by increasing this value.
You can use options(SBC.min_chunk_size = value)
to set a minimum chunk size globally.
See documentation of future.chunk.size
argument for future.apply::future_lapply()
for more details.
Derived quantities to include in SBC. Use derived_quantities()
to construct them.
Type of caching of results, currently the only supported modes are
"none"
(do not cache) and "results"
where the whole results object is stored
and recomputed only when the hash of the backend or dataset changes.
The filesystem location of cache. For cache_mode = "results"
this should be a name of a single file. If the file name does not end with
.rds
, this extension is appended.
A list of names of objects that are defined
in the global environment and need to present for the backend to work (
if they are not already available in package).
It is added to the globals
argument to future::future()
, to make those
objects available on all workers.
Deprecated, use dquants instead
An object of class SBC_results()
.
Parallel processing is supported via the future
package, for most uses, it is most sensible
to just call plan(multisession)
once in your R session and all
cores your computer will be used. For more details refer to the documentation
of the future
package.
When using backends based on MCMC, there are two possible moments when
draws may need to be thinned. They can be thinned directly within the backend
and they may be thinned only to compute the ranks for SBC as specified by the
thin_ranks
argument. The main reason those are separate is that computing the
ranks requires no or negligible autocorrelation while some autocorrelation
may be easily tolerated for summarising the fit results or assessing convergence.
In fact, thinning too aggressively in the backend may lead to overly noisy
estimates of posterior means, quantiles and the posterior::rhat()
and
posterior::ess_tail()
diagnostics. So for well-adapted Hamiltonian Monte-Carlo
chains (e.g. Stan-based backends), we recommend no thinning in the backend and
even value of thin_ranks
between 6 and 10 is usually sufficient to remove
the residual autocorrelation. For a backend based on Metropolis-Hastings,
it might be sensible to thin quite aggressively already in the backend and
then have some additional thinning via thin_ranks
.
Backends that don't require thining should implement SBC_backend_iid_draws()
or SBC_backend_default_thin_ranks()
to avoid thinning by default.
Some of the visualizations and post processing steps
we use in the SBC package (e.g. plot_rank_hist()
, empirical_coverage()
)
work best if the total number of possible SBC ranks is a "nice" number
(lots of divisors).
However, the number of ranks is one plus the number of posterior samples
after thinning - therefore as long as the number of samples is a "nice"
number, the number of ranks usually will not be. To remedy this, you can
specify ensure_num_ranks_divisor
- the method will drop at most
ensure_num_ranks_divisor - 1
samples to make the number of ranks divisible
by ensure_num_ranks_divisor
. The default 2 prevents the most annoying
pathologies while discarding at most a single sample.