-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
pivot metrics show fails to parse metrics YAML files that contain Python-specific YAML tags. This happens when stage functions annotate outputs as pivot.metric using types like pd.DataFrame or custom dataclasses.
Affected stages in eval-pipeline
| Stage | Type | Error |
|---|---|---|
ga_paper/generate_agent_summary:summary |
pd.DataFrame |
could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame' |
eval_pipeline_horizon/calculate_baseline_statistics |
SourceStats dataclass |
could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:eval_pipeline.horizon.calculate_baseline_statistics.SourceStats' |
eval_pipeline_horizon/generate_agent_summary:agent_summary |
pd.DataFrame |
same as above |
mr_time_horizon_1_0/generate_agent_summary:agent_summary |
pd.DataFrame |
same |
mr_time_horizon_1_1/generate_agent_summary:agent_summary |
pd.DataFrame |
same |
Current behavior
Pivot writes these metrics using yaml.dump() with default settings, which embeds Python-specific tags (!!python/object:...). When pivot metrics show later reads them with safe YAML loading, it can't parse the tags and emits a warning:
Failed to parse metrics from ga_paper/generate_agent_summary:summary: Failed to parse .../summary.yaml: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame'
The metric is silently omitted from output.
Expected behavior
Pivot should either:
- Serialize at write time: When a metric value is a DataFrame or dataclass, automatically convert to a YAML-safe dict before writing (e.g.,
df.to_dict(),dataclasses.asdict()) - Deserialize at read time: Use a restricted set of constructors that can handle common types like DataFrames and dataclasses
Option 1 is probably better — it keeps the YAML files human-readable and avoids security concerns with yaml.unsafe_load.
Reproduction
cd eval_pipeline/difficulty
pivot metrics show --all
# Look for "Failed to parse metrics" warningsContext
Found during E2E smoke testing of the eval-pipeline (199 stages across 7 sub-pipelines). The metrics write correctly at stage execution time but can't be read back by the CLI.