Model selection guide

This guide explains how meridian-tools supports Bayesian model selection using Leave-One-Out (LOO) cross-validation and the Watanabe-Akaike Information Criterion (WAIC). It covers when model selection is available, how to interpret the outputs, and how to compare multiple candidate models.

What model selection provides

Bayesian model selection uses information criteria computed from pointwise log-likelihood values to compare model specifications. Unlike predictive accuracy on a held-out set, LOO and WAIC evaluate the model’s expected predictive performance using the full posterior without requiring a separate validation split.

meridian-tools wraps ArviZ’s az.loo and az.waic with:

  • Automatic log-likelihood reconstruction for fitted Meridian models
  • Structured error handling when model selection is not possible
  • A compare_models surface for ranking multiple candidates
  • Artefact-level compatibility status in every run directory

Compatibility boundary

Model selection is only available for models where holdout_id is None. This means:

Run type Model selection available
Full-sample fit (no validation) Yes
Final-fit run (mode: final_fit) Yes
Blocked-tail validation run No
Rolling-origin validation split No
Authored-holdout run No
Bare InferenceData without log_likelihood No

This restriction exists because LOO and WAIC require the full observed likelihood surface. A holdout fit has a modified likelihood that does not represent the full data generating process. Comparing a holdout fit’s ELPD against a full fit’s ELPD would be statistically meaningless.

How it works in the pipeline

When exports.export_model_selection: true in the YAML config, the runner’s 30_model_assessment stage attempts model selection after writing diagnostics.

Compatible runs

For compatible models, the stage writes:

  • loo_summary.json — LOO summary statistics (ELPD, p_loo, SE, etc.)
  • waic_summary.json — WAIC summary statistics
  • loo_pointwise.csv — Per-observation LOO values and Pareto k diagnostics
  • waic_pointwise.csv — Per-observation WAIC values
  • model_comparison.csv — Ranked comparison table (single-model for individual runs)

Incompatible runs

For incompatible models, the stage writes a single status artefact:

  • model_selection_status.json
{
  "status": "unavailable",
  "reason_code": "holdout_fit_unsupported",
  "reason": "Model selection requires holdout_id is None ..."
}

Known reason codes:

Code Meaning
holdout_fit_unsupported The model was fitted with a holdout mask
requires_fitted_meridian_model Missing posterior samples or ArviZ InferenceData
missing_log_likelihood_group Bare InferenceData without reconstructable likelihood
meridian_internal_seam_incompatible Meridian version lacks required internal reconstruction methods

Incompatibility is non-fatal. The pipeline completes successfully and records the reason in the artefact.

Using the Python API directly

Compute LOO for a single model

from meridian_tools.model_selection import compute_loo

result = compute_loo(fitted_model, pointwise=True)

print(result.kind)          # "loo"
print(result.summary)       # {"kind": "loo", "elpd_loo": -123.4, ...}
print(result.pointwise)     # DataFrame with loo_i, pareto_k per observation

Compute WAIC for a single model

from meridian_tools.model_selection import compute_waic

result = compute_waic(fitted_model, pointwise=True)

print(result.kind)          # "waic"
print(result.summary)       # {"kind": "waic", "elpd_waic": -125.1, ...}

Compare multiple models

from meridian_tools.model_selection import compare_models

comparison = compare_models(
    {
        "model_a": fitted_model_a,
        "model_b": fitted_model_b,
    },
    ic="loo",   # or "waic"
)

print(comparison)
# DataFrame with columns: model, rank, elpd_loo, p_loo, elpd_diff, weight, se, dse, warning, scale

The comparison table is ranked by ELPD. The best model has rank 0 and elpd_diff == 0. The weight column gives stacking weights.

Check log-likelihood availability

from meridian_tools.model_selection import has_log_likelihood

if has_log_likelihood(fitted_model):
    result = compute_loo(fitted_model)

Log-likelihood reconstruction

Meridian does not store pointwise log-likelihood in its InferenceData by default. meridian-tools reconstructs it automatically when you pass a fitted Meridian model to compute_loo, compute_waic, or compare_models.

The reconstruction:

  1. Recovers unsaved posterior parameters (e.g. geo deviations, tau_g)
  2. Rebuilds the joint distribution from the posterior samples
  3. Computes observation-level log-likelihood
  4. Returns a new InferenceData with the log_likelihood group attached

The original model is never mutated. The reconstruction produces a temporary copy used only for the ArviZ computation.

You can also control this explicitly:

from meridian_tools.log_likelihood import attach_log_likelihood

# Returns new InferenceData with log_likelihood group (original unchanged)
idata_with_ll = attach_log_likelihood(fitted_model, in_place=False)

# Mutates the model's inference_data in place
attach_log_likelihood(fitted_model, in_place=True)

Interpreting the outputs

LOO summary

Field Meaning
elpd_loo Expected log pointwise predictive density (higher is better)
p_loo Effective number of parameters
se Standard error of elpd_loo
warning Whether Pareto k diagnostics indicate unreliable estimates

WAIC summary

Field Meaning
elpd_waic Expected log pointwise predictive density (WAIC estimate)
p_waic Effective number of parameters (WAIC estimate)
se Standard error of elpd_waic
warning Whether posterior variance diagnostics indicate unreliable estimates

Pareto k diagnostics

The pointwise LOO output includes a pareto_k column. Values above 0.7 indicate that the LOO approximation is unreliable for those observations. ArviZ will emit a warning if any Pareto k values exceed the threshold.

Model comparison

When comparing two or more models:

  • elpd_diff — Difference in ELPD from the best model (0 for the best)
  • dse — Standard error of the ELPD difference
  • weight — Stacking weight (how much to trust each model)
  • Models are ranked by ELPD (rank 0 is best)

A single-model comparison returns a one-row table with rank=0, elpd_diff=0, and weight=1.0.

Error handling

All model-selection errors are raised as ModelSelectionError with a structured reason_code:

from meridian_tools.model_selection import ModelSelectionError, compute_loo

try:
    result = compute_loo(candidate)
except ModelSelectionError as exc:
    print(exc.reason_code)  # e.g. "holdout_fit_unsupported"
    print(str(exc))         # Human-readable explanation

In the pipeline, these errors are caught and written to model_selection_status.json rather than failing the run.