Validation guide

This guide explains how to choose and configure validation strategies in meridian-tools. Validation is the process of evaluating a candidate model specification on held-out data before committing to a final production fit.

Why validation matters for MMM

Marketing Mix Models are fitted to time series data. Unlike standard supervised learning, the temporal structure of the data means that naive IID cross-validation (random train/test splits) is statistically inappropriate. meridian-tools does not implement random shuffling or naive k-fold splits. Instead, it provides two time-respecting validation strategies and a clear separation between validation runs and the final production fit.

Validation strategies

`none` — No validation

validation:
  strategy: none

The model is fitted on the full dataset with no holdout. Use this when you do not need candidate evaluation — for example, when rerunning a previously validated specification.

`blocked_tail` — Single contiguous tail holdout

validation:
  strategy: blocked_tail
  holdout_size: 8

Reserves the last holdout_size time periods as a test block. The model is fitted on all preceding periods. This is the recommended default for short MMM time series where you want one simple candidate evaluation.

When to use: Most standard MMM projects with fewer than 150 weekly observations.

How it works:

Time axis: [t1, t2, t3, t4, t5, t6, t7, t8, t9, t10]
holdout_size: 3

Train: [t1, t2, t3, t4, t5, t6, t7]
Test:  [t8, t9, t10]

The holdout mask is generated automatically and injected into Meridian’s holdout_id parameter. For geo-panel models, the mask is broadcast across all geos.

`rolling_origin` — Expanding-window validation

validation:
  strategy: rolling_origin
  initial_train_size: 52
  test_size: 4
  step_size: 4
  max_splits: 3

Creates multiple expanding-window splits where each successive split adds more training data. This provides a more robust evaluation signal than a single blocked tail, but requires enough history to support multiple splits.

When to use: Projects with longer time series (typically 100+ weekly observations) where you want multiple evaluation windows.

How it works:

Time axis: [t1, t2, ..., t52, t53, ..., t56, t57, ..., t60]

Split 1: Train [t1..t52], Test [t53..t56]
Split 2: Train [t1..t56], Test [t57..t60]

Constraints:

step_size must equal test_size (non-overlapping test windows).
max_splits must be at least 2.
initial_train_size + test_size must not exceed the number of observations.
The plan must yield at least two splits.

`authored_holdout` — User-provided holdout mask

This is not a YAML strategy setting. Instead, you provide holdout_id directly in model_spec.kwargs:

model_spec:
  kwargs:
    holdout_id: [false, false, false, true, true]

When the runner detects an authored holdout_id in the YAML, it treats the run as an authored_holdout validation run. The mask is passed through to Meridian verbatim and recorded in the validation spec artefact.

When to use: When you need a specific holdout pattern that does not follow blocked-tail or rolling-origin conventions.

CLI vs Python API

Blocked tail from the CLI

blocked_tail runs directly from the CLI because they produce one run:

meridian-tools run --config project.yml --output-dir runs

Rolling origin requires the Python API

rolling_origin is a Python-first planning surface because it produces multiple runs — one per split plus a final fit. The CLI will reject direct rolling_origin execution:

# This will fail:
meridian-tools run --config project.yml  # with strategy: rolling_origin
# ValueError: cannot execute `rolling_origin` directly

Instead, use the Python API:

from pathlib import Path

import pandas as pd

from meridian_tools.config import PipelineRunConfig, load_yaml_config
from meridian_tools.cv import build_validation_plan
from meridian_tools.runner import run_pipeline

config_path = Path("project.yml")
config = load_yaml_config(config_path)

# Read the time index from your data
data_path = config.data.path
if not data_path.is_absolute():
    data_path = (config_path.parent / data_path).resolve()

frame = pd.read_csv(data_path)
time_column = config.data.coord_to_columns["time"]
geo_column = config.data.coord_to_columns.get("geo")

time_index = frame[time_column].drop_duplicates().tolist()
geo_index = None
if geo_column is not None:
    geo_index = frame[geo_column].drop_duplicates().tolist()

# Build the validation plan
validation_plan = build_validation_plan(
    config.validation,
    time_index=time_index,
    geo_index=geo_index,
)

# Execute each validation split. Each split fits only through the end of its
# own test window; future observations are excluded from that fit.
for run_spec in validation_plan.validation_runs:
    run_pipeline(
        PipelineRunConfig(
            config_path=config_path,
            output_dir=Path("runs"),
            validation_spec=run_spec,
        )
    )

Separating validation from the final fit

Validation runs and the final production fit are different jobs. First you evaluate candidate specifications on held-out splits. Then, once you have chosen the specification, you run a separate full-sample fit with no holdout.

Do not reuse a validation fit as the production artefact. The validation fit was trained on a subset of the data and its posterior reflects that subset.

Final fit after blocked tail

For blocked_tail, build_validation_plan provides a final_fit_run spec:

validation_plan = build_validation_plan(config.validation, time_index, geo_index)

# Run the final fit on all data
final_result = run_pipeline(
    PipelineRunConfig(
        config_path=config_path,
        output_dir=Path("runs"),
        validation_spec=validation_plan.final_fit_run,
    )
)

Final fit after rolling origin

The same pattern works for rolling origin:

# After running all validation splits...
final_result = run_pipeline(
    PipelineRunConfig(
        config_path=config_path,
        output_dir=Path("runs"),
        validation_spec=validation_plan.final_fit_run,
    )
)

The final_fit_run spec has mode="final_fit", strategy="none", and holdout_id=None. It trains on the full time axis with no holdout.

Run directory naming

The runner automatically appends a validation-aware suffix to the run name:

Scenario	Run name pattern
No validation	`<project_name>_<timestamp>`
Blocked tail	`<project_name>_blocked_tail_<timestamp>`
Rolling origin split 1	`<project_name>_split_01_<timestamp>`
Final fit	`<project_name>_final_fit_<timestamp>`
Authored holdout	`<project_name>_authored_holdout_<timestamp>`

Override the name with --run-name or PipelineRunConfig(run_name=...).

Validation spec artefact

Every validation-aware run writes a validation_spec.json artefact in the 10_validation/ stage directory. This JSON records:

mode — "validation" or "final_fit"
strategy — the validation strategy used
split_label — human-readable split identifier
holdout_source — "generated_validation", "authored_model_spec", or "none"
generated_holdout — whether the holdout mask was auto-generated
holdout_shape — shape of the holdout mask (without the actual data)
train_indices / test_indices — integer indices into the time axis
train_dates / test_dates — corresponding date values
validation_spec_version — current value 2
data_binding — source/execution coordinate fingerprints used to refresh or reject the split safely

The actual holdout mask is not stored in the JSON artefact (it can be large). It is reconstructed and injected into the model at runtime from the stored indices and the bound execution geometry.

For rolling-origin validation, the execution geometry is intentionally bounded: split 1 fits only through split 1’s test window, split 2 fits only through split 2’s test window, and so on. Later source observations are invisible to earlier validation fits.

Interaction with model selection

Bayesian model selection (LOO/WAIC) is only available for runs where holdout_id is None — meaning full-sample fitted models and final-fit runs. Validation fits and authored-holdout runs write a model_selection_status.json artefact instead of LOO/WAIC outputs. See the model selection guide for details.

The design intent is that validation and model selection answer different questions. Validation holds out declared time periods before fitting. LOO/WAIC compare compatible full-sample or final-fit candidates using reconstructed pointwise likelihood. Both controls support the same agency requirement: a model choice should be defensible after the notebook is gone. See why meridian-tools exists for the full rationale.