[bug] Pod-level failure reason (OOMKilled, ImagePullBackOff) not recorded in MLMD execution error details in KFP v2

### Environment
- How did you deploy Kubeflow Pipelines (KFP)? Standard KFP v2 with Argo Workflows
- KFP version: 2.14.x / 2.15.x
- KFP SDK version: kfp 2.x

### Steps to reproduce
1. Create a pipeline component that will trigger a pod-level failure.
   For OOMKilled: use a component that allocates more memory than the container limit.
   For ImagePullBackOff: set an invalid or non-existent image in the component spec.
2. Run the pipeline and wait for the task to fail.
3. Once the run fails, check the MLMD execution record for that task - either via
   the KFP API or by querying the MLMD gRPC server directly.
4. Observe that the execution's error details contain only a generic launcher error
   string, not the actual pod failure reason (e.g., OOMKilled, exit code 137).
5. Compare with `kubectl describe pod <failed-pod>` - the actual reason is visible
   there but is absent from MLMD.

### Expected result
When a pod fails due to a platform-level reason (OOMKilled, ImagePullBackOff,
CrashLoopBackOff, Evicted, etc.), the MLMD execution record for that task should
include structured error details capturing:
- The pod failure reason (e.g., `OOMKilled`)
- The container exit code where available (e.g., 137 for OOMKilled)
- Optionally: the pod name and namespace for traceability

This would allow the backend to propagate a meaningful failure reason to the UI
and give users actionable information without requiring access to kubectl.

### Current behavior
The launcher (`backend/src/v2/component/launcher_v2.go`) publishes execution state
on failure, but the error context it records comes only from its own runtime error —
not from the Kubernetes pod status. As a result, the MLMD execution record for a
pod-killed task contains a generic error string while the actual failure reason
(e.g., `OOMKilled` with exit code 137, or `ImagePullBackOff` with the image name)
is silently dropped.

`kubectl describe pod` is currently the only way to see the actual failure reason.

### Materials and Reference

Affected code paths:
- `backend/src/v2/component/launcher_v2.go` — failure publication path
- `backend/src/v2/driver/driver.go` — where pod status could be read and normalized
- `backend/src/v2/metadata/client.go` — where execution error details are written to MLMD

Related issues:
- #12843 - Pod lifecycle failure UI visualization (this issue is a prerequisite)
- #12425 - Failed-task artifact handling (same execution path)

---
Impacted by this bug? Give it a 👍.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Pod-level failure reason (OOMKilled, ImagePullBackOff) not recorded in MLMD execution error details in KFP v2 #13192

Environment

Steps to reproduce

Expected result

Current behavior

Materials and Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Pod-level failure reason (OOMKilled, ImagePullBackOff) not recorded in MLMD execution error details in KFP v2 #13192

Description

Environment

Steps to reproduce

Expected result

Current behavior

Materials and Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions