Validation guide

This guide explains how to choose and configure validation strategies in meridian-tools. Validation is the process of evaluating a candidate model specification on held-out data before committing to a final production fit.

Why validation matters for MMM

Marketing Mix Models are fitted to time series data. Unlike standard supervised learning, the temporal structure of the data means that naive IID cross-validation (random train/test splits) is statistically inappropriate. meridian-tools does not implement random shuffling or naive k-fold splits. Instead, it provides two time-respecting validation strategies and a clear separation between validation runs and the final production fit.

Validation strategies

none — No validation

validation:
  strategy: none

The model is fitted on the full dataset with no holdout. Use this when you do not need candidate evaluation — for example, when rerunning a previously validated specification.

blocked_tail — Single contiguous tail holdout

validation:
  strategy: blocked_tail
  holdout_size: 8

Reserves the last holdout_size time periods as a test block. The model is fitted on all preceding periods. This is the recommended default for short MMM time series where you want one simple candidate evaluation.

When to use: Most standard MMM projects with fewer than 150 weekly observations.

How it works:

Time axis: [t1, t2, t3, t4, t5, t6, t7, t8, t9, t10]
holdout_size: 3

Train: [t1, t2, t3, t4, t5, t6, t7]
Test:  [t8, t9, t10]

The holdout mask is generated automatically and injected into Meridian’s holdout_id parameter. For geo-panel models, the mask is broadcast across all geos.

rolling_origin — Expanding-window validation

validation:
  strategy: rolling_origin
  initial_train_size: 52
  test_size: 4
  step_size: 4
  max_splits: 3

Creates multiple expanding-window splits where each successive split adds more training data. This provides a more robust evaluation signal than a single blocked tail, but requires enough history to support multiple splits.

When to use: Projects with longer time series (typically 100+ weekly observations) where you want multiple evaluation windows.

How it works:

Time axis: [t1, t2, ..., t52, t53, ..., t56, t57, ..., t60]

Split 1: Train [t1..t52], Test [t53..t56]
Split 2: Train [t1..t56], Test [t57..t60]

Constraints:

  • step_size must equal test_size (non-overlapping test windows).
  • max_splits must be at least 2.
  • initial_train_size + test_size must not exceed the number of observations.
  • The plan must yield at least two splits.

authored_holdout — User-provided holdout mask

This is not a YAML strategy setting. Instead, you provide holdout_id directly in model_spec.kwargs:

model_spec:
  kwargs:
    holdout_id: [false, false, false, true, true]

When the runner detects an authored holdout_id in the YAML, it treats the run as an authored_holdout validation run. The mask is passed through to Meridian verbatim and recorded in the validation spec artefact.

When to use: When you need a specific holdout pattern that does not follow blocked-tail or rolling-origin conventions.

CLI vs Python API

Blocked tail from the CLI

blocked_tail runs directly from the CLI because they produce one run:

meridian-tools run --config project.yml --output-dir runs

Rolling origin requires the Python API

rolling_origin is a Python-first planning surface because it produces multiple runs — one per split plus a final fit. The CLI will reject direct rolling_origin execution:

# This will fail:
meridian-tools run --config project.yml  # with strategy: rolling_origin
# ValueError: cannot execute `rolling_origin` directly

Instead, use the Python API:

from pathlib import Path

import pandas as pd

from meridian_tools.config import PipelineRunConfig, load_yaml_config
from meridian_tools.cv import build_validation_plan
from meridian_tools.runner import run_pipeline

config_path = Path("project.yml")
config = load_yaml_config(config_path)

# Read the time index from your data
data_path = config.data.path
if not data_path.is_absolute():
    data_path = (config_path.parent / data_path).resolve()

frame = pd.read_csv(data_path)
time_column = config.data.coord_to_columns["time"]
geo_column = config.data.coord_to_columns.get("geo")

time_index = frame[time_column].drop_duplicates().tolist()
geo_index = None
if geo_column is not None:
    geo_index = frame[geo_column].drop_duplicates().tolist()

# Build the validation plan
validation_plan = build_validation_plan(
    config.validation,
    time_index=time_index,
    geo_index=geo_index,
)

# Execute each validation split
for run_spec in validation_plan.validation_runs:
    run_pipeline(
        PipelineRunConfig(
            config_path=config_path,
            output_dir=Path("runs"),
            validation_spec=run_spec,
        )
    )

Separating validation from the final fit

Validation runs and the final production fit are different jobs. First you evaluate candidate specifications on held-out splits. Then, once you have chosen the specification, you run a separate full-sample fit with no holdout.

Do not reuse a validation fit as the production artefact. The validation fit was trained on a subset of the data and its posterior reflects that subset.

Final fit after blocked tail

For blocked_tail, build_validation_plan provides a final_fit_run spec:

validation_plan = build_validation_plan(config.validation, time_index, geo_index)

# Run the final fit on all data
final_result = run_pipeline(
    PipelineRunConfig(
        config_path=config_path,
        output_dir=Path("runs"),
        validation_spec=validation_plan.final_fit_run,
    )
)

Final fit after rolling origin

The same pattern works for rolling origin:

# After running all validation splits...
final_result = run_pipeline(
    PipelineRunConfig(
        config_path=config_path,
        output_dir=Path("runs"),
        validation_spec=validation_plan.final_fit_run,
    )
)

The final_fit_run spec has mode="final_fit", strategy="none", and holdout_id=None. It trains on the full time axis with no holdout.

Run directory naming

The runner automatically appends a validation-aware suffix to the run name:

Scenario Run name pattern
No validation <project_name>_<timestamp>
Blocked tail <project_name>_blocked_tail_<timestamp>
Rolling origin split 1 <project_name>_split_01_<timestamp>
Final fit <project_name>_final_fit_<timestamp>
Authored holdout <project_name>_authored_holdout_<timestamp>

Override the name with --run-name or PipelineRunConfig(run_name=...).

Validation spec artefact

Every validation-aware run writes a validation_spec.json artefact in the 10_validation/ stage directory. This JSON records:

  • mode"validation" or "final_fit"
  • strategy — the validation strategy used
  • split_label — human-readable split identifier
  • holdout_source"generated_validation", "authored_model_spec", or "none"
  • generated_holdout — whether the holdout mask was auto-generated
  • holdout_shape — shape of the holdout mask (without the actual data)
  • train_indices / test_indices — integer indices into the time axis
  • train_dates / test_dates — corresponding date values

The actual holdout mask is not stored in the JSON artefact (it can be large). It is injected into the model at runtime.

Interaction with model selection

Bayesian model selection (LOO/WAIC) is only available for runs where holdout_id is None — meaning full-sample fitted models and final-fit runs. Validation fits and authored-holdout runs write a model_selection_status.json artefact instead of LOO/WAIC outputs. See the model selection guide for details.