Architecture
meridian-tools is a companion package designed for agency teams that use
Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It
provides a stricter, more reproducible workflow around Meridian without forking
the upstream library.
Core philosophy
- No forking —
meridian-toolsstrictly wraps Meridian. It does not modify Meridian’s internal code or model implementations. - Reproducibility — All runs are driven by typed YAML configurations, ensuring that models can be perfectly reproduced.
- Structured workflow — The package enforces a staged execution pipeline (validation, model fit, assessment, decomposition, response curves, optimisation).
- Lifecycle management — Runs are treated as immutable artefacts with rich metadata, allowing for easy comparison, refreshing, and storage.
Module map
Layered import design
Meridian and TensorFlow are never imported at module level in the configuration, validation, or CLI layers. This means lightweight operations respond instantly:
| Operation | Imports loaded |
|---|---|
meridian-tools --help |
pydantic, yaml |
load_yaml_config(path) |
pydantic, yaml |
build_validation_plan(...) |
numpy |
run_pipeline(...) |
Everything (Meridian, TF, ArviZ, etc.) |
The __init__.py uses __getattr__-based lazy loading so that
import meridian_tools does not trigger heavy dependency imports.
Pipeline execution model
The runner executes stages sequentially. Each stage:
- Creates a
StageRecordand appends it to the in-memory manifest. - Calls the stage function, which returns a
dict[str, Path]of artefacts. - Normalises artefact paths to be relative to the run directory.
- Writes the updated manifest to disk.
This design means a crash mid-pipeline leaves a readable partial manifest on
disk. The last entry in the stages array is the last successfully completed
stage.
The numbering gap at 50 reserves space for future stages without renumbering.
Configuration architecture
The separation between authored YAML and runtime-only config is strict:
MeridianToolsConfig— Pydantic model for the YAML file. Owns project metadata, data paths, model spec, fit settings, validation strategy, and export switches.PipelineRunConfig— Frozen dataclass for runtime options. Owns output directory, run name, and concrete validation spec.
The runner writes two config copies to each run directory:
config.source.yaml— Verbatim copy of the input YAML.config.resolved.yaml— After relative path resolution. Never includes runtime-only fields.
Artefact path normalisation
All artefact paths in manifests are stored relative to the run directory
through normalize_artifact_paths. This makes run directories portable across
machines.
The lifecycle layer resolves them back to absolute paths at load time.
Meridian coupling boundaries
| Coupling level | Modules | Surface used |
|---|---|---|
| Public API | runner.py, exports.py |
Meridian, ModelSpec, CsvDataLoader, Analyzer, Summarizer, BudgetOptimizer |
| Semi-public | log_likelihood.py, exports.py |
model_context, inference_data, input_data |
| Private | log_likelihood.py |
_get_joint_dist_unpinned, _prepare_latents_for_reconstruction, _reconstruct_posteriors |
The private-API coupling is confined to log_likelihood.py and wrapped in
comprehensive error handling. See
Meridian integration for details.
Data flow
- Input — A typed YAML file defines the entire run scope.
- Initialisation — The runner resolves the config and creates a timestamped run directory.
- Execution — The pipeline steps through stages, maintaining a central state dictionary with the fitted model and intermediate results.
- Export — Each stage writes specific artefacts to disk within the run directory.
- Finalisation — The manifest is completed with
status: "completed"andfinished_at, locking the run state. - Lifecycle — Downstream processes or analysts consume artefacts or use lifecycle tools to compare, refresh, or audit runs.