Architecture

meridian-tools is a companion package designed for agency teams that use Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It provides a stricter, more reproducible workflow around Meridian without forking the upstream library.

Core philosophy

No forking — meridian-tools strictly wraps Meridian. It does not modify Meridian’s internal code or model implementations.
Bounded reproducibility — Runs are driven by typed YAML configurations, archived source/resolved configs, manifest metadata, and input-data provenance. These records support repeatable execution in the documented dependency environment, but they do not guarantee identical posterior draws across all hardware, dependency versions, random seeds, or Meridian changes.
Structured workflow — The package enforces a staged execution pipeline (validation, model fit, assessment, decomposition, response curves, optimisation).
Lifecycle management — Runs are treated as immutable artefacts with rich metadata, allowing for easy comparison, refreshing, and storage.

Module map

meridian_tools/
├── __init__.py          Lazy-loading package exports
├── artifacts.py         Manifest and JSON helpers
├── cli.py               CLI entry point (argparse)
├── config.py            Pydantic YAML models
├── cv.py                Validation split logic
├── demo.py              Bundled demo discovery
├── diagnostics.py       Diagnostics export
├── exports.py           Meridian analysis surface wrappers
├── launcher.py          Run execution wrapper
├── lifecycle.py         Post-run record management
├── log_likelihood.py    Log-likelihood reconstruction adapter
├── model_selection.py   ArviZ LOO/WAIC wrappers
├── terminal.py          CLI presentation and warning grouping
└── version.py           Static version

Layered import design

Meridian and TensorFlow are never imported at module level in the configuration, validation, or CLI layers. This means lightweight operations respond instantly:

Operation	Imports loaded
`meridian-tools --help`	`pydantic`, `yaml`
`load_yaml_config(path)`	`pydantic`, `yaml`
`build_validation_plan(...)`	`numpy`
`run_pipeline(...)`	Everything (Meridian, TF, ArviZ, etc.)

The __init__.py uses __getattr__-based lazy loading so that import meridian_tools does not trigger heavy dependency imports.

Pipeline execution model

The runner executes stages sequentially. Each stage:

Creates a StageRecord and appends it to the in-memory manifest.
Calls the stage function, which returns a dict[str, Path] of artefacts.
Normalises artefact paths to be relative to the run directory.
Writes the updated manifest to disk.

This design means a crash mid-pipeline leaves a readable partial manifest on disk. The last entry in the stages array is the last successfully completed stage.

┌─────────────────────┐
│  00_run_metadata    │  Archive source + resolved configs
├─────────────────────┤
│  10_validation      │  Write validation spec (if applicable)
├─────────────────────┤
│  20_model_fit       │  Build data → build model → sample posterior
├─────────────────────┤
│  30_model_assessment│  Diagnostics + model selection + summary
├─────────────────────┤
│  40_decomposition   │  Summary metrics (NetCDF + CSV)
├─────────────────────┤
│  60_response_curves │  Response curves (if configured)
├─────────────────────┤
│  70_optimisation    │  Budget optimisation (if configured)
└─────────────────────┘

The numbering gap at 50 reserves space for future stages without renumbering.

Configuration architecture

The separation between authored YAML and runtime-only config is strict:

MeridianToolsConfig — Pydantic model for the YAML file. Owns project metadata, data paths, model spec, fit settings, validation strategy, and export switches.
PipelineRunConfig — Frozen dataclass for runtime options. Owns output directory, run name, and concrete validation spec.

The runner writes two config copies to each run directory:

config.source.yaml — Verbatim copy of the input YAML.
config.resolved.yaml — After relative path resolution. Never includes runtime-only fields.

Artefact path normalisation

All artefact paths in manifests are stored relative to the run directory and validated as regular files beneath that directory. New manifest version 4 runs reject absolute paths, lexical .. components, paths that resolve outside the run directory, directories, missing paths, and special files. This makes run directories portable while keeping manifest consumers fail-closed.

The lifecycle layer resolves accepted paths back to absolute paths at load time.

Meridian coupling boundaries

Coupling level	Modules	Surface used
Public API	`runner.py`, `exports.py`	`Meridian`, `ModelSpec`, `CsvDataLoader`, `Analyzer`, `Summarizer`, `BudgetOptimizer`
Semi-public	`log_likelihood.py`, `exports.py`	`model_context`, `inference_data`, `input_data`
Private	`log_likelihood.py`	`_get_joint_dist_unpinned`, `_prepare_latents_for_reconstruction`, `_reconstruct_posteriors`

The private-API coupling is confined to log_likelihood.py and wrapped in comprehensive error handling. See Meridian integration for details.

Data flow

Input — A typed YAML file defines the entire run scope.
Initialisation — The runner resolves the config and creates a timestamped run directory.
Execution — The pipeline steps through stages, maintaining a central state dictionary with the fitted model and intermediate results.
Export — Each stage writes specific artefacts to disk within the run directory.
Finalisation — The manifest is completed with status: "completed" and finished_at, locking the run state.
Lifecycle — Downstream processes or analysts consume artefacts or use lifecycle tools to compare, refresh, or audit runs.