Meridian Tools workflow guide

This guide shows the supported end-to-end agency workflow for meridian-tools. It starts with one YAML config, moves through candidate validation, separates the final full-sample fit from the validation runs, and ends with the artefacts you should hand over or inspect later. The examples in this guide stay inside the implemented package surface. They do not assume notebooks, dashboards, or unpublished helper scripts.

Before you start

Install Meridian first, then install meridian-tools in the same environment:

python -m pip install -c constraints/dev.txt -e ".[dev]"

Use the CLI for ordinary run execution. Use the Python API when you need rolling-origin planning, an explicit final-fit run, or lifecycle compare and refresh operations. Phase 07 does not provide a lifecycle CLI.

If you want packaged reference examples before authoring your own YAML, use the bundled demo guide in demos.md. The packaged demo launcher is meridian-tools demo .... The repo-root python runme.py ... wrapper remains available when you are working from a source checkout.

Author one YAML config

Keep the authored project definition in YAML. Keep runtime-only choices out of the YAML file. In practice, that means your source file owns the project metadata, data path, model specification, fit settings, validation settings, and export switches. Runtime-only values such as output_dir, run_name, and one concrete validation_spec belong in PipelineRunConfig or the CLI call, not in config.resolved.yaml.

Here is one exact blocked-tail config:

project:
  name: client-mmm

data:
  path: ./client_dataset.csv
  kpi_type: revenue
  coord_to_columns:
    time: week
    geo: market
    kpi: revenue
    population: population
    media: [impressions_tv, impressions_search]
    media_spend: [spend_tv, spend_search]
    controls: [promo_flag, price_index]

model_spec:
  kwargs:
    max_lag: 8
    media_prior_type: roi

fit:
  n_chains: 4
  n_adapt: 500
  n_burnin: 500
  n_keep: 1000
  seed: 20260331

validation:
  strategy: blocked_tail
  holdout_size: 8

exports:
  export_predictive_accuracy: true
  export_review_summary: true
  export_model_selection: true

Choose the right validation path

Use blocked_tail when you want one contiguous future block for candidate evaluation. This is often the right default for short MMM time series. Use rolling_origin when you have enough history to evaluate more than one expanding-window split. Do not treat rolling_origin as ordinary k-fold cross-validation. The package does not implement naive IID folds or random shuffling because that is not the right statistical workflow for MMM time series.

Validation runs and the final production fit are different jobs. First, you evaluate candidate specifications on blocked time splits. Then, once you have chosen the specification, you run a separate full-sample fit with no holdout.

Run one blocked-tail candidate from the CLI

Once the YAML file is authored, you can execute a blocked-tail candidate run directly through the CLI:

meridian-tools run --config project.yml --output-dir runs

The same packaged runner surface is available through the thin repo-root wrapper:

python runme.py run --config project.yml --output-dir runs

This command creates a dated run directory under runs/. If you need to change the output location or the visible run name, pass --output-dir or --run-name at execution time. Those are runtime-only overrides. They affect the run directory and manifest, but they do not become part of the authored YAML contract.

Plan and run rolling-origin validation through the Python API

rolling_origin is a Python-first planning surface because you need one concrete split at a time. Start with an explicit YAML definition:

validation:
  strategy: rolling_origin
  initial_train_size: 52
  test_size: 4
  step_size: 4
  max_splits: 3

Then materialise and execute the validation runs:

from pathlib import Path

import pandas as pd

from meridian_tools.config import PipelineRunConfig, load_yaml_config
from meridian_tools.cv import build_validation_plan
from meridian_tools.runner import run_pipeline

config_path = Path("project.yml")
config = load_yaml_config(config_path)

data_path = config.data.path
if not data_path.is_absolute():
    data_path = (config_path.parent / data_path).resolve()

frame = pd.read_csv(data_path)
time_column = config.data.coord_to_columns["time"]
geo_column = config.data.coord_to_columns.get("geo")

time_index = frame[time_column].drop_duplicates().tolist()
geo_index = None
if geo_column is not None:
    geo_index = frame[geo_column].drop_duplicates().tolist()

validation_plan = build_validation_plan(
    config.validation,
    time_index=time_index,
    geo_index=geo_index,
)

for run_spec in validation_plan.validation_runs:
    run_pipeline(
        PipelineRunConfig(
            config_path=config_path,
            output_dir=Path("runs"),
            validation_spec=run_spec,
        )
    )

build_validation_plan(...) gives you one concrete ValidationRunSpec per split. run_pipeline(...) remains the primitive that executes one actual run.

Run the final full-sample fit separately

After you have chosen the winning specification, run the final fit on the full sample. Do not reuse a validation fit as the production artefact.

from pathlib import Path

from meridian_tools.config import PipelineRunConfig
from meridian_tools.runner import run_pipeline

final_result = run_pipeline(
    PipelineRunConfig(
        config_path=Path("project.yml"),
        output_dir=Path("runs"),
        validation_spec=validation_plan.final_fit_run,
    )
)

print(final_result.run_dir)
print(final_result.manifest_path)

For rolling_origin and blocked_tail workflows, validation_plan.final_fit_run is the explicit no-holdout runtime spec. It keeps the boundary clear. Candidate validation and final production fitting are separate steps.

Know which artefacts matter for handoff

Each successful run directory is the handoff unit. The important files are:

run_manifest.json for stage status, versions, timestamps, and top-level artefact links
00_run_metadata/config.source.yaml for the authored source config
00_run_metadata/config.resolved.yaml for the YAML-owned config after path resolution
00_run_metadata/input_data_provenance.json for the exact dataset identity used by the run
10_validation/validation_spec.json when the run is validation-aware
30_model_assessment/diagnostics_bundle.json for stable diagnostics metadata
30_model_assessment/model_results_summary.html for the wrapped Meridian assessment summary
30_model_assessment/plots/ for assessment PNG plots such as model fit and rhat review
40_decomposition/summary_metrics.csv and summary_metrics.nc for decomposition exports
40_decomposition/plots/ for decomposition PNG plots
60_response_curves/plots/response_curves_plot.png when response-curve export is enabled
70_optimisation/plots/ when optimisation export is enabled
30_model_assessment model-selection outputs when the run is compatible, or 30_model_assessment/model_selection_status.json when it is not

Read those artefacts together. 30_model_assessment/diagnostics_bundle.json tells you whether predictive accuracy and review summary were exported or disabled. The assessment stage either contains the real Bayesian model-selection outputs or one explicit compatibility status artefact.

The supported Bayesian model-selection boundary is narrow and deliberate. The package supports fitted Meridian models where holdout_id is None. That means full-sample fitted models and explicit final-fit runs are compatible. Validation fits and authored holdout fits are not.

Use lifecycle helpers after a run exists

Once you have stored run directories, the lifecycle API lets you reload, compare, and refresh them without going back to notebook state.

from pathlib import Path

from meridian_tools.lifecycle import compare_run_records, load_run_record, refresh_run

validation_run_dir = Path("runs/client-mmm_blocked_tail_20260401_101500")
final_fit_run_dir = Path("runs/client-mmm_final_fit_20260401_114200")

final_fit_record = load_run_record(final_fit_run_dir)
comparison = compare_run_records(validation_run_dir, final_fit_run_dir)
refreshed = refresh_run(final_fit_run_dir, run_name="client-mmm_final_fit_refresh")

print(final_fit_record.manifest.run_name)
print(comparison)
print(refreshed.run_dir)

compare_run_records(...) gives you a metadata-level comparison. It does not attempt a raw-file diff across every output. refresh_run(...) rebuilds a new sibling run from the stored run-local artefacts. It does not overwrite the source run. Phase 07 does not provide lifecycle CLI commands, so use the Python API for these operations.

Know the staged output schema

The current run layout is:

<run_dir>/
  run_manifest.json
  00_run_metadata/
  10_validation/
  20_model_fit/
  30_model_assessment/
    plots/
  40_decomposition/
    plots/
  60_response_curves/
    plots/
  70_optimisation/
    plots/

The runner always writes:

00_run_metadata
20_model_fit
30_model_assessment
40_decomposition

The runner writes these only when applicable:

10_validation
60_response_curves
70_optimisation

For the bundled reference examples and the exact stage-level file set, see demos.md.

A practical analyst sequence

If you want one concrete operating pattern, use this one. Author a YAML file. Run a blocked-tail candidate through the CLI when you need one held-out tail block. Use rolling_origin through build_validation_plan(...) when you need multiple expanding-window validation splits. Choose the modelling specification. Run the final full-sample fit as its own job. Review the run directory artefacts. Then use compare_run_records(...) and refresh_run(...) when you need to inspect or rerun stored work later.