Background material on architecture, design decisions, and Meridian integration boundaries.
Pages
Architecture — meridian-tools is a companion package designed for agency teams that use Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It provides a stricter, more reproducible workflow around Meridian without forking the upstream library.
Design decisions — This document records the key design decisions in meridian-tools and the reasoning behind them. It is intended for maintainers and contributors who need to understand why things are built the way they are.
Meridian integration — This document describes how meridian-tools integrates with Google Meridian, the boundaries of that integration, and the risks associated with different coupling levels.
Subsections of Concepts
Architecture
meridian-tools is a companion package designed for agency teams that use
Google Meridian as their client-facing MMM (Marketing Mix Modelling) engine. It
provides a stricter, more reproducible workflow around Meridian without forking
the upstream library.
Core philosophy
No forking — meridian-tools strictly wraps Meridian. It does not modify
Meridian’s internal code or model implementations.
Reproducibility — All runs are driven by typed YAML configurations,
ensuring that models can be perfectly reproduced.
Structured workflow — The package enforces a staged execution pipeline
(validation, model fit, assessment, decomposition, response curves,
optimisation).
Lifecycle management — Runs are treated as immutable artefacts with rich
metadata, allowing for easy comparison, refreshing, and storage.
Meridian and TensorFlow are never imported at module level in the configuration,
validation, or CLI layers. This means lightweight operations respond instantly:
Operation
Imports loaded
meridian-tools --help
pydantic, yaml
load_yaml_config(path)
pydantic, yaml
build_validation_plan(...)
numpy
run_pipeline(...)
Everything (Meridian, TF, ArviZ, etc.)
The __init__.py uses __getattr__-based lazy loading so that
import meridian_tools does not trigger heavy dependency imports.
Pipeline execution model
The runner executes stages sequentially. Each stage:
Creates a StageRecord and appends it to the in-memory manifest.
Calls the stage function, which returns a dict[str, Path] of artefacts.
Normalises artefact paths to be relative to the run directory.
Writes the updated manifest to disk.
This design means a crash mid-pipeline leaves a readable partial manifest on
disk. The last entry in the stages array is the last successfully completed
stage.
The numbering gap at 50 reserves space for future stages without renumbering.
Configuration architecture
The separation between authored YAML and runtime-only config is strict:
MeridianToolsConfig — Pydantic model for the YAML file. Owns project
metadata, data paths, model spec, fit settings, validation strategy, and
export switches.
PipelineRunConfig — Frozen dataclass for runtime options. Owns output
directory, run name, and concrete validation spec.
The runner writes two config copies to each run directory:
config.source.yaml — Verbatim copy of the input YAML.
config.resolved.yaml — After relative path resolution. Never includes
runtime-only fields.
Artefact path normalisation
All artefact paths in manifests are stored relative to the run directory
through normalize_artifact_paths. This makes run directories portable across
machines.
The lifecycle layer resolves them back to absolute paths at load time.
The private-API coupling is confined to log_likelihood.py and wrapped in
comprehensive error handling. See
Meridian integration for details.
Data flow
Input — A typed YAML file defines the entire run scope.
Initialisation — The runner resolves the config and creates a timestamped
run directory.
Execution — The pipeline steps through stages, maintaining a central
state dictionary with the fitted model and intermediate results.
Export — Each stage writes specific artefacts to disk within the run
directory.
Finalisation — The manifest is completed with status: "completed" and
finished_at, locking the run state.
Lifecycle — Downstream processes or analysts consume artefacts or use
lifecycle tools to compare, refresh, or audit runs.
Design decisions
This document records the key design decisions in meridian-tools and the
reasoning behind them. It is intended for maintainers and contributors who
need to understand why things are built the way they are.
No IID cross-validation
Decision:meridian-tools does not implement random-shuffle or naive k-fold
cross-validation.
Reasoning: MMM data is time series. Random IID splits break temporal
structure, leading to data leakage where future observations inform training
and past observations appear in the test set. This produces optimistic accuracy
estimates that do not reflect real-world forecasting performance.
The package provides two time-respecting alternatives:
Blocked tail — reserves the most recent observations as a single test
block.
Rolling origin — expanding-window forward-chaining that respects temporal
ordering at every split.
Non-overlapping rolling-origin test windows
Decision:step_size must equal test_size for rolling-origin splits.
Reasoning: Overlapping test windows would mean the same observation appears
in multiple test sets. This violates the independence assumption needed for
comparing validation scores across splits and complicates the interpretation of
aggregate metrics. Non-overlapping windows ensure each observation is evaluated
exactly once across the split plan.
Minimum two splits for rolling origin
Decision:build_rolling_origin_splits requires at least two splits.
Reasoning: A single rolling-origin split is functionally identical to a
blocked-tail holdout and provides no comparative signal. If your data only
supports one split, use blocked_tail instead — it communicates the intent
more clearly.
Holdout restriction for model selection
Decision: LOO and WAIC are only available for models where
holdout_id is None.
Reasoning: LOO and WAIC estimate expected log predictive density (ELPD)
using the full observed likelihood surface. A model fitted with a holdout mask
has a modified likelihood that excludes held-out observations. Computing LOO on
this truncated likelihood would produce ELPD estimates that are not comparable
to those from full-sample fits.
The correct workflow is:
Use validation splits for candidate evaluation.
Select the best specification based on holdout performance.
Refit the chosen specification on the full dataset.
Compute LOO/WAIC on the full-sample fit for model quality reporting.
Separation of validation fits and final fits
Decision: Validation runs and final production fits are separate pipeline
executions that produce separate run directories.
Reasoning: A validation fit is trained on a subset of the data. Its
posterior reflects that subset and should not be used as the production
artefact. Keeping them as separate runs prevents accidental use of a validation
fit for downstream analysis or reporting.
Lazy imports for CLI responsiveness
Decision: Heavy dependencies (TensorFlow, NumPy, Meridian, ArviZ) are not
imported at module level in the config, CLI, or validation layers.
Reasoning: TensorFlow alone takes several seconds to import. The CLI must
respond instantly for --help and --list operations. The __init__.py uses
__getattr__-based lazy loading, and the test suite verifies that
build_parser() only loads pydantic and yaml.
Pydantic extra="forbid" everywhere
Decision: All configuration models reject unexpected keys.
Reasoning: Silent acceptance of unknown keys is a common source of
misconfiguration in YAML-driven tools. A typo like export_pridictive_accuracy
would be silently ignored without extra="forbid", leading to unexpected
default behaviour. Strict rejection catches these errors at config load time
with clear error messages.
Relative artefact paths in manifests
Decision: All artefact paths in run_manifest.json are stored relative to
the run directory.
Reasoning: Absolute paths would tie run directories to a specific machine
or filesystem layout. Relative paths make run directories portable — they can
be copied, archived, or moved between machines without breaking the manifest
contract.
Non-destructive lifecycle operations
Decision:refresh_run creates a new sibling directory rather than
overwriting the source.
Reasoning: Overwriting a validated production run would destroy the audit
trail. Creating a sibling preserves the original for comparison and rollback.
The lifecycle layer explicitly validates that source directories are not
mutated by refresh operations.
Manifest-per-stage persistence
Decision: The manifest is written to disk after each stage completes, not
only at the end of the pipeline.
Reasoning: MCMC sampling can run for minutes to hours. If the process
crashes or is killed during a later stage, the partial manifest on disk
reflects what completed successfully. This aids debugging and allows partial
runs to be inspected without special tooling.
Stage numbering with gaps
Decision: Pipeline stages use numbers 00, 10, 20, 30, 40, 60, 70 with a
gap at 50.
Reasoning: The gaps allow future stages to be inserted at natural positions
(e.g. a stage 50 for custom analysis) without renumbering existing stages.
Renumbering would break backward compatibility with stored manifests and any
downstream tooling that references stage names.
Config source vs. resolved archival
Decision: Both the verbatim source YAML and the resolved YAML are archived
in every run directory.
Reasoning: The source YAML shows what the analyst authored (including
relative paths). The resolved YAML shows the runtime interpretation (absolute
paths, defaults applied). Both are needed for reproducibility:
The source is needed to understand intent.
The resolved config is needed to reproduce the exact execution.
Runtime-only fields (output_dir, run_name, validation_spec) are
deliberately excluded from the resolved config because they are not part of
the reproducible model specification.
Structured model selection errors
Decision: Model selection failures produce ModelSelectionError with a
machine-readable reason_code rather than generic exceptions.
Reasoning: The pipeline needs to distinguish between “model selection is
not possible for this run type” (expected) and “something is broken”
(unexpected). Structured reason codes allow:
The runner to write model_selection_status.json without failing the run.
The lifecycle layer to compare model selection availability across runs.
Downstream consumers to programmatically handle different failure modes.
Meridian integration
This document describes how meridian-tools integrates with Google Meridian,
the boundaries of that integration, and the risks associated with different
coupling levels.
Integration philosophy
meridian-tools wraps Meridian without forking it. Meridian remains the
modelling engine; meridian-tools adds workflow orchestration, validation,
diagnostics bundling, model selection, and lifecycle management on top.
This approach means:
Meridian upgrades can be adopted without merging fork changes.
The upstream project’s API stability directly affects meridian-tools.
Any use of Meridian-internal APIs must be explicitly managed.
Coupling levels
Public API (low risk)
These are documented, versioned Meridian surfaces:
These are unlikely to break without a Meridian major version bump. The exact
google-meridian==1.5.3 pin keeps these assumptions aligned with the validated
release baseline.
Semi-public API (medium risk)
These are accessible attributes on Meridian model objects that are used but
not formally documented as stable:
Surface
Used by
Purpose
model.inference_data
log_likelihood.py, model_selection.py
Access ArviZ InferenceData
model.model_context
log_likelihood.py, exports.py
Access model structure
model.input_data
exports.py
Access input data for spend computation
model.posterior_sampler_callable
log_likelihood.py
Access posterior sampler
These are stable in practice (they are used by Meridian’s own analysis
surfaces) but are not guaranteed to be stable across versions.
Private API (high risk)
These are _-prefixed methods on Meridian’s posterior_sampler_callable,
used exclusively in log_likelihood.py for log-likelihood reconstruction:
These methods are Meridian-internal and may change or be removed in any
Meridian release, including patch versions. They are necessary because
Meridian does not provide a public API for pointwise log-likelihood
computation.
Risk mitigation
Compatibility guard
log_likelihood.py checks for the presence of all three private methods
before attempting reconstruction:
If any method is missing, the error is caught and recorded as a
model_selection_status.json artefact with
reason_code: meridian_internal_seam_incompatible. The rest of the pipeline
continues normally.
Graceful degradation
Model selection incompatibility is non-fatal at every level:
log_likelihood.py raises ModelSelectionError with a structured code.
model_selection.py propagates the error.
runner.py catches it, writes model_selection_status.json, and continues.
The manifest records the assessment stage as completed.
The lifecycle layer can inspect model_selection_status to understand why
model selection was unavailable.
Version pinning
The pyproject.toml pins Meridian to google-meridian[schema]==1.5.3. Any
Meridian upgrade must refresh the private log-likelihood reconstruction
baseline before the version guard is relaxed.
Integration testing
The test suite includes a gated live Meridian verification command:
one reduced real pipeline run over bundled demo data, including stored-run
refresh after the original YAML is removed
the lower-level live log-likelihood reconstruction path
It is excluded from the default test suite because it requires real MCMC
sampling, but it should be run after every Meridian version upgrade.
Constants dependency
log_likelihood.py uses Meridian constants for posterior parameter names:
frommeridianimportconstants# constants.BETA_GM, constants.TAU_G, constants.ETA_M, etc.
These are stable string constants but are not versioned. A Meridian release
that renames these constants would cause import-time failures.
Unsaved posterior parameter recovery
Meridian does not persist all posterior parameters to InferenceData. The
_recover_unsaved_state function in log_likelihood.py reconstructs:
tau_g_excl_baseline — Recovered from the posterior’s tau_g variable
by slicing out the baseline geo index (concatenating the elements before and
after baseline_geo_idx).
Geo deviations — Recovered from the posterior by solving
deviation = (target - base) / scale for normal effects, or
deviation = (log(target) - base) / scale for log-normal effects, with a
scale == 0 guard that maps to zero.
This recovery is mathematically correct for the supported model families
(log-normal and normal media effects). It is tested against both geo-panel
and national models in test_log_likelihood.py.
What breaks on a Meridian upgrade
Change type
Impact
Detection
Public API signature change
runner.py, exports.py break
Default test suite
Semi-public attribute rename
log_likelihood.py, exports.py break
Default test suite
Private method removal/rename
Model selection disabled
Live smoke test or model_selection_status.json
Constant rename
Import-time failure
Default test suite
New posterior parameter
Log-likelihood may be incorrect
Manual review + live smoke test
Changed likelihood formula
Log-likelihood may be incorrect
Live smoke test
Recommended upgrade procedure
Pin the new Meridian version in a branch.
Run the full default test suite: pytest tests/ -v.
Run the live Meridian verification command:
MERIDIAN_TOOLS_ENABLE_REAL_FIT=1 pytest tests/test_demo_integration.py::test_real_pipeline_refresh_smoke tests/test_log_likelihood.py::test_compute_log_likelihood_dataset_real_posterior_smoke -m real_fit -v.
If model selection breaks, check model_selection_status.json for the
reason code.
If private methods changed, update log_likelihood.py to match the new
Meridian internals or accept graceful degradation.
Update docs/project/release-baseline.md with the new verified state.