Skip to content

metrics show: support non-primitive metric types (DataFrames, dataclasses) #453

@sjawhar

Description

@sjawhar

Problem

pivot metrics show fails to parse metrics YAML files that contain Python-specific YAML tags. This happens when stage functions annotate outputs as pivot.metric using types like pd.DataFrame or custom dataclasses.

Affected stages in eval-pipeline

Stage Type Error
ga_paper/generate_agent_summary:summary pd.DataFrame could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame'
eval_pipeline_horizon/calculate_baseline_statistics SourceStats dataclass could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:eval_pipeline.horizon.calculate_baseline_statistics.SourceStats'
eval_pipeline_horizon/generate_agent_summary:agent_summary pd.DataFrame same as above
mr_time_horizon_1_0/generate_agent_summary:agent_summary pd.DataFrame same
mr_time_horizon_1_1/generate_agent_summary:agent_summary pd.DataFrame same

Current behavior

Pivot writes these metrics using yaml.dump() with default settings, which embeds Python-specific tags (!!python/object:...). When pivot metrics show later reads them with safe YAML loading, it can't parse the tags and emits a warning:

Failed to parse metrics from ga_paper/generate_agent_summary:summary: Failed to parse .../summary.yaml: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame'

The metric is silently omitted from output.

Expected behavior

Pivot should either:

  1. Serialize at write time: When a metric value is a DataFrame or dataclass, automatically convert to a YAML-safe dict before writing (e.g., df.to_dict(), dataclasses.asdict())
  2. Deserialize at read time: Use a restricted set of constructors that can handle common types like DataFrames and dataclasses

Option 1 is probably better — it keeps the YAML files human-readable and avoids security concerns with yaml.unsafe_load.

Reproduction

cd eval_pipeline/difficulty
pivot metrics show --all
# Look for "Failed to parse metrics" warnings

Context

Found during E2E smoke testing of the eval-pipeline (199 stages across 7 sub-pipelines). The metrics write correctly at stage execution time but can't be read back by the CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions