Architecture

meridian-tools is a companion package designed for agency teams that use Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It provides a stricter, more reproducible workflow around Meridian without forking the upstream library.

Core philosophy

  1. No forkingmeridian-tools strictly wraps Meridian. It does not modify Meridian’s internal code or model implementations.
  2. Reproducibility — All runs are driven by typed YAML configurations, ensuring that models can be perfectly reproduced.
  3. Structured workflow — The package enforces a staged execution pipeline (validation, model fit, assessment, decomposition, response curves, optimisation).
  4. Lifecycle management — Runs are treated as immutable artefacts with rich metadata, allowing for easy comparison, refreshing, and storage.

Module map

meridian_tools/
├── __init__.py          Lazy-loading package exports
├── artifacts.py         Manifest and JSON helpers
├── cli.py               CLI entry point (argparse)
├── config.py            Pydantic YAML models
├── cv.py                Validation split logic
├── demo.py              Bundled demo discovery
├── diagnostics.py       Diagnostics export
├── exports.py           Meridian analysis surface wrappers
├── launcher.py          Run execution wrapper
├── lifecycle.py         Post-run record management
├── log_likelihood.py    Log-likelihood reconstruction adapter
├── model_selection.py   ArviZ LOO/WAIC wrappers
├── terminal.py          CLI presentation and warning grouping
└── version.py           Static version

Layered import design

Meridian and TensorFlow are never imported at module level in the configuration, validation, or CLI layers. This means lightweight operations respond instantly:

Operation Imports loaded
meridian-tools --help pydantic, yaml
load_yaml_config(path) pydantic, yaml
build_validation_plan(...) numpy
run_pipeline(...) Everything (Meridian, TF, ArviZ, etc.)

The __init__.py uses __getattr__-based lazy loading so that import meridian_tools does not trigger heavy dependency imports.

Pipeline execution model

The runner executes stages sequentially. Each stage:

  1. Creates a StageRecord and appends it to the in-memory manifest.
  2. Calls the stage function, which returns a dict[str, Path] of artefacts.
  3. Normalises artefact paths to be relative to the run directory.
  4. Writes the updated manifest to disk.

This design means a crash mid-pipeline leaves a readable partial manifest on disk. The last entry in the stages array is the last successfully completed stage.

┌─────────────────────┐
│  00_run_metadata    │  Archive source + resolved configs
├─────────────────────┤
│  10_validation      │  Write validation spec (if applicable)
├─────────────────────┤
│  20_model_fit       │  Build data → build model → sample posterior
├─────────────────────┤
│  30_model_assessment│  Diagnostics + model selection + summary
├─────────────────────┤
│  40_decomposition   │  Summary metrics (NetCDF + CSV)
├─────────────────────┤
│  60_response_curves │  Response curves (if configured)
├─────────────────────┤
│  70_optimisation    │  Budget optimisation (if configured)
└─────────────────────┘

The numbering gap at 50 reserves space for future stages without renumbering.

Configuration architecture

The separation between authored YAML and runtime-only config is strict:

  • MeridianToolsConfig — Pydantic model for the YAML file. Owns project metadata, data paths, model spec, fit settings, validation strategy, and export switches.
  • PipelineRunConfig — Frozen dataclass for runtime options. Owns output directory, run name, and concrete validation spec.

The runner writes two config copies to each run directory:

  • config.source.yaml — Verbatim copy of the input YAML.
  • config.resolved.yaml — After relative path resolution. Never includes runtime-only fields.

Artefact path normalisation

All artefact paths in manifests are stored relative to the run directory through normalize_artifact_paths. This makes run directories portable across machines. The lifecycle layer resolves them back to absolute paths at load time.

Meridian coupling boundaries

Coupling level Modules Surface used
Public API runner.py, exports.py Meridian, ModelSpec, CsvDataLoader, Analyzer, Summarizer, BudgetOptimizer
Semi-public log_likelihood.py, exports.py model_context, inference_data, input_data
Private log_likelihood.py _get_joint_dist_unpinned, _prepare_latents_for_reconstruction, _reconstruct_posteriors

The private-API coupling is confined to log_likelihood.py and wrapped in comprehensive error handling. See Meridian integration for details.

Data flow

  1. Input — A typed YAML file defines the entire run scope.
  2. Initialisation — The runner resolves the config and creates a timestamped run directory.
  3. Execution — The pipeline steps through stages, maintaining a central state dictionary with the fitted model and intermediate results.
  4. Export — Each stage writes specific artefacts to disk within the run directory.
  5. Finalisation — The manifest is completed with status: "completed" and finished_at, locking the run state.
  6. Lifecycle — Downstream processes or analysts consume artefacts or use lifecycle tools to compare, refresh, or audit runs.