Model selection guide
This guide explains how meridian-tools supports Bayesian model selection
using Leave-One-Out (LOO) cross-validation and the Watanabe-Akaike Information
Criterion (WAIC). It covers when model selection is available, how to interpret
the outputs, and how to compare multiple candidate models.
What model selection provides
Bayesian model selection uses information criteria computed from pointwise log-likelihood values to compare model specifications. Unlike predictive accuracy on a held-out set, LOO and WAIC evaluate the model’s expected predictive performance using the full posterior without requiring a separate validation split.
meridian-tools wraps ArviZ’s az.loo and az.waic with:
- Automatic log-likelihood reconstruction for fitted Meridian models
- Structured error handling when model selection is not possible
- A
compare_modelssurface for ranking multiple candidates - Artefact-level compatibility status in every run directory
Compatibility boundary
Model selection is only available for models where holdout_id is None.
This means:
| Run type | Model selection available |
|---|---|
| Full-sample fit (no validation) | Yes |
Final-fit run (mode: final_fit) |
Yes |
| Blocked-tail validation run | No |
| Rolling-origin validation split | No |
| Authored-holdout run | No |
Bare InferenceData without log_likelihood |
No |
This restriction exists because LOO and WAIC require the full observed likelihood surface. A holdout fit has a modified likelihood that does not represent the full data generating process. Comparing a holdout fit’s ELPD against a full fit’s ELPD would be statistically meaningless.
How it works in the pipeline
When exports.export_model_selection: true in the YAML config, the runner’s
30_model_assessment stage attempts model selection after writing diagnostics.
Compatible runs
For compatible models, the stage writes:
loo_summary.json— LOO summary statistics (ELPD, p_loo, SE, etc.)waic_summary.json— WAIC summary statisticsloo_pointwise.csv— Per-observation LOO values and Pareto k diagnosticswaic_pointwise.csv— Per-observation WAIC valuesmodel_comparison.csv— Ranked comparison table (single-model for individual runs)
Incompatible runs
For incompatible models, the stage writes a single status artefact:
model_selection_status.json
Known reason codes:
| Code | Meaning |
|---|---|
holdout_fit_unsupported |
The model was fitted with a holdout mask |
requires_fitted_meridian_model |
Missing posterior samples or ArviZ InferenceData |
missing_log_likelihood_group |
Bare InferenceData without reconstructable likelihood |
meridian_internal_seam_incompatible |
Meridian version lacks required internal reconstruction methods |
Incompatibility is non-fatal. The pipeline completes successfully and records the reason in the artefact.
Using the Python API directly
Compute LOO for a single model
Compute WAIC for a single model
Compare multiple models
The comparison table is ranked by ELPD. The best model has rank 0 and
elpd_diff == 0. The weight column gives stacking weights.
Check log-likelihood availability
Log-likelihood reconstruction
Meridian does not store pointwise log-likelihood in its InferenceData by
default. meridian-tools reconstructs it automatically when you pass a
fitted Meridian model to compute_loo, compute_waic, or compare_models.
The reconstruction:
- Recovers unsaved posterior parameters (e.g. geo deviations, tau_g)
- Rebuilds the joint distribution from the posterior samples
- Computes observation-level log-likelihood
- Returns a new
InferenceDatawith thelog_likelihoodgroup attached
The original model is never mutated. The reconstruction produces a temporary copy used only for the ArviZ computation.
You can also control this explicitly:
Interpreting the outputs
LOO summary
| Field | Meaning |
|---|---|
elpd_loo |
Expected log pointwise predictive density (higher is better) |
p_loo |
Effective number of parameters |
se |
Standard error of elpd_loo |
warning |
Whether Pareto k diagnostics indicate unreliable estimates |
WAIC summary
| Field | Meaning |
|---|---|
elpd_waic |
Expected log pointwise predictive density (WAIC estimate) |
p_waic |
Effective number of parameters (WAIC estimate) |
se |
Standard error of elpd_waic |
warning |
Whether posterior variance diagnostics indicate unreliable estimates |
Pareto k diagnostics
The pointwise LOO output includes a pareto_k column. Values above 0.7
indicate that the LOO approximation is unreliable for those observations.
ArviZ will emit a warning if any Pareto k values exceed the threshold.
Model comparison
When comparing two or more models:
elpd_diff— Difference in ELPD from the best model (0 for the best)dse— Standard error of the ELPD differenceweight— Stacking weight (how much to trust each model)- Models are ranked by ELPD (rank 0 is best)
A single-model comparison returns a one-row table with rank=0,
elpd_diff=0, and weight=1.0.
Error handling
All model-selection errors are raised as ModelSelectionError with a
structured reason_code:
In the pipeline, these errors are caught and written to
model_selection_status.json rather than failing the run.