Lifecycle management guide
meridian-tools treats completed runs as immutable artefacts. The lifecycle
module provides tools to load, compare, and refresh past runs without mutating
them. This guide explains each lifecycle operation and when to use it.
Core concepts
Run records
A RunRecord encapsulates a run’s metadata and artefact paths. It is loaded
from a run directory by reading run_manifest.json and resolving all artefact
paths against the directory.
All paths in the record are absolute. Required artefacts (config_source,
config_resolved) are validated at load time and always present.
input_data_provenance is also required for manifest version 3 runs.
Optional artefacts (diagnostics_bundle, validation_spec,
model_selection_status) are None if not present in the manifest.
Immutability
Lifecycle operations never modify a source run directory. When you refresh a run, the output goes to a new sibling directory. When you compare runs, both source directories remain untouched.
All lifecycle functions raise LifecycleError (a RuntimeError subclass)
when they encounter invalid state.
Loading a run record
From a run directory
From a manifest path
Both forms are accepted. The function detects whether the argument is a directory or a manifest file.
Validation at load time
load_run_record validates:
- The manifest JSON is well-formed and has a supported version (0, 1, 2, or 3).
- Required config artefact entries (
config_source,config_resolved) exist in the manifest. - Manifest version 3 runs also include
input_data_provenance. - Required artefact files actually exist on disk.
- No artefact path escapes the run directory (path traversal protection).
- Claimed optional artefacts exist on disk (a manifest that references a missing file is rejected).
If any check fails, a LifecycleError is raised with a descriptive message.
Listing run records
list_run_records discovers all direct child directories that contain a
run_manifest.json and returns them sorted by started_at timestamp
(most recent first), with run directory name as a secondary sort key.
The function requires a directory path (not a file). It will raise an error if any discovered run directory contains an invalid manifest — it does not silently skip broken runs.
Refreshing a run
Refreshing re-executes a run using its stored configuration but writes the output to a new directory. The source run is never modified.
When to refresh
- After a Meridian upgrade — to check whether the new version produces comparable results with the same specification.
- After a code change — to verify that refactoring did not change model outputs.
- After extending the dataset — to refit the model with additional observations using the same validated specification.
How to refresh
build_refresh_run_config reconstructs a PipelineRunConfig from the source
run’s stored configuration:
- The execution config path points to the source run’s
config.resolved.yaml. - The source config path points to the source run’s
config.source.yaml, so the refreshed run preserves the original authored YAML in its own metadata. - The output directory is set to the source run’s parent directory (creating a sibling).
- The run name suffix is stripped to produce a clean refresh name.
- For validation runs, the validation spec is reconstructed from the stored
validation_spec.json.
Refresh with overrides
You can override specific settings:
Validation-aware refresh
If the source run was a validation run (blocked tail or rolling origin),
build_refresh_run_config reconstructs the validation spec from the stored
artefact, including the holdout mask geometry. For authored-holdout runs, it
reuses the YAML-owned holdout from the copied config.
For final-fit runs, the refresh produces another final-fit run with the same full-sample training specification.
Comparing runs
compare_run_records accepts run directory paths (not RunRecord objects)
and returns a pandas DataFrame with columns field, left, right,
status, and changed. The compared fields include:
run_nameandstatus— basic identity.meridian_tools_versionandmeridian_version— version drift.has_validation_specandhas_diagnostics_bundle— artefact presence.predictive_accuracy_statusandreview_summary_status— diagnostics.has_model_selection_outputsandmodel_selection_reason_code— model selection.input_authored_path,input_resolved_path,input_sha256,input_size_bytes,input_mtime_utc,input_row_count,input_column_count, andinput_ordered_columns— dataset identity and shape.
This is useful for auditing whether a refresh or a specification change produced materially different results.
If either run predates manifest version 3, provenance rows are reported with
status == "legacy_unknown" and changed == None. That distinguishes
“no stored provenance exists” from “the dataset definitely changed”.
Lifecycle workflow example
A typical lifecycle workflow for a quarterly model refresh:
Manifest versioning
The lifecycle layer supports manifest versions 0, 1, 2, and 3. Older manifests are handled gracefully with default values for fields that were added in later versions. The current version is 3.
This means you can load run directories created by earlier versions of
meridian-tools without issues. The loaded RunRecord keeps the same shape,
but input_data_provenance_path is None for pre-v3 runs because those
manifests predate provenance capture.