fix(llm): fix OpenAI param and default analysis to failed runs#2615
fix(llm): fix OpenAI param and default analysis to failed runs#2615chmouel merged 2 commits intotektoncd:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the LLM analysis triggering logic by moving the default failure check into the analyzer, which allows for more granular control via CEL expressions. Additionally, the OpenAI client was updated to use the MaxCompletionTokens field to align with current API standards. Review feedback focuses on improving the robustness of the PipelineRun status check by explicitly querying the Succeeded condition instead of relying on slice indexing, and recommends explicitly defining condition types in test cases for better clarity and maintenance.
d442566 to
7bfe9a5
Compare
|
@theakshaypant Please consider applying this change to use diff --git a/pkg/llm/analyzer.go b/pkg/llm/analyzer.go
index 2fe55a27a..a3e824f15 100644
--- a/pkg/llm/analyzer.go
+++ b/pkg/llm/analyzer.go
@@ -16,6 +16,7 @@ import (
tektonv1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
+ apis "knative.dev/pkg/apis"
)
// AnalysisResult represents the result of an LLM analysis.
@@ -300,7 +301,8 @@ func countFailedResults(results []AnalysisResult) int {
// If no on_cel is provided, defaults to triggering only for failed PipelineRuns.
func (a *Analyzer) shouldTriggerRole(role v1alpha1.AnalysisRole, celContext map[string]any, pr *tektonv1.PipelineRun) (bool, error) {
if role.OnCEL == "" {
- return len(pr.Status.Conditions) > 0 && pr.Status.Conditions[0].Status == corev1.ConditionFalse, nil
+ succeededCondition := pr.Status.GetCondition(apis.ConditionSucceeded)
+ return succeededCondition != nil && succeededCondition.Status == corev1.ConditionFalse, nil
}
result, err := cel.Value(role.OnCEL, celContext["body"],
diff --git a/pkg/llm/analyzer_test.go b/pkg/llm/analyzer_test.go
index bb2eb9dca..c6403fa3d 100644
--- a/pkg/llm/analyzer_test.go
+++ b/pkg/llm/analyzer_test.go
@@ -412,7 +412,7 @@ func TestAnalyzer_ShouldTriggerRoleEvaluations(t *testing.T) {
}
failedPR := &tektonv1.PipelineRun{}
- failedPR.Status.Conditions = append(failedPR.Status.Conditions, apis.Condition{Status: "False"})
+ failedPR.Status.Conditions = append(failedPR.Status.Conditions, apis.Condition{Type: apis.ConditionSucceeded, Status: "False"})
tests := []struct {
name string
@@ -464,10 +464,10 @@ func TestAnalyzer_ShouldTriggerRole(t *testing.T) {
analyzer := NewAnalyzer(run, kinteract, logger)
failedPR := &tektonv1.PipelineRun{}
- failedPR.Status.Conditions = append(failedPR.Status.Conditions, apis.Condition{Status: "False"})
+ failedPR.Status.Conditions = append(failedPR.Status.Conditions, apis.Condition{Type: apis.ConditionSucceeded, Status: "False"})
succeededPR := &tektonv1.PipelineRun{}
- succeededPR.Status.Conditions = append(succeededPR.Status.Conditions, apis.Condition{Status: "True"})
+ succeededPR.Status.Conditions = append(succeededPR.Status.Conditions, apis.Condition{Type: apis.ConditionSucceeded, Status: "True"})
tests := []struct {
name string |
|
/lgtm |
There was a problem hiding this comment.
Congrats @theakshaypant your PR Has been approved 🎉
✅ Pull Request Approved
Approval Status:
- Required Approvals: 1
- Current Approvals: 1
👥 Reviewers Who Approved:
| Reviewer | Permission Level | Approval Status |
|---|---|---|
| @chmouel | admin |
✅ |
📝 Next Steps
- Ensure all required checks pass
- Comply with branch protection rules
- Request a maintainer to merge using the
/mergecommand (or merge it
directly if you have repository permission).
Automated by the PAC Boussole 🧭
Newer OpenAI models (gpt-5, o1, o3 series) reject `max_tokens` and require `max_completion_tokens` instead. This caused LLM analysis to fail with HTTP 400 errors for these models. The internal `max_tokens` config field is kept provider-agnostic; only the OpenAI API request maps it to `max_completion_tokens`. Fixes tektoncd#2576 Signed-off-by: Akshay Pant <akpant@redhat.com> Co-Authored-by: BoseKarthikeyan <bkarth17@ford.com>
Remove the hard-coded failed-only check from the reconciler and delegate it to the LLM orchestrator. When no on_cel expression is defined for a role, the orchestrator defaults to triggering only for failed PipelineRuns. An explicit on_cel expression overrides this, allowing roles to run on any pipeline outcome. Signed-off-by: Akshay Pant <akpant@redhat.com>
7a890f3 to
c664c4e
Compare
Gotcha! i had moved the condition directly from the reconciler so did not make the change when requested by gemini. Added the suggested patch in c664c4e // Perform LLM analysis only for failed pipeline runs (best-effort, non-blocking)
// Users can use CEL expressions in role configurations for more fine-grained control
if len(newPr.Status.Conditions) > 0 && newPr.Status.Conditions[0].Status == corev1.ConditionFalse { |
📝 Description of the Change
max_tokenswithmax_completion_tokensin the OpenAI request struct fixing errors on newer models (gpt-5, o1, o3 series).
The internal
max_tokensconfig field is kept provider-agnostic.LLM orchestrator. Roles now run on all pipeline completions by default
only if
on_celis set to'true'; omittingon_celpreserves theoriginal failed-only behavior.
🔗 Linked GitHub Issue
Fixes #2576
🧪 Testing Strategy
Repo manifest
$ kg repo akshay-pac-test-repo -n default -oyaml apiVersion: pipelinesascode.tekton.dev/v1alpha1 kind: Repository metadata: annotations: ... creationTimestamp: "2026-03-27T07:50:37Z" generation: 10 name: akshay-pac-test-repo namespace: default resourceVersion: "14045" uid: 2a398763-205b-4d50-84fe-b1117bc2fecd pipelinerun_status: ... spec: settings: ai: enabled: true max_tokens: 1000 provider: openai roles: - context_items: commit_content: true container_logs: enabled: false error_content: true pr_content: true name: pipeline-summary on_cel: "true" output: pr-comment prompt: "You are a CI/CD assistant providing pipeline run summaries.\n**Generate a brief summary of this pipeline run:**\n## \U0001F4CA Pipeline Run Summary\n### Status\n[Success/Failed/Other - based on context]\n### Key Metrics\n- Duration: [if available]\n- Tasks Run: [count if available]\n### Highlights\n- [Notable successes or issues]\n- [Any warnings or recommendations]\nKeep it brief and informative.\n" - context_items: container_logs: enabled: true max_lines: 100 error_content: true model: o3 name: failure-analysis output: pr-comment prompt: | You are a DevOps expert. Analyze this failed pipeline and: 1. Identify the root cause 2. Suggest specific fixes 3. Recommend preventive measures secret_ref: key: token name: openai-api-key timeout_seconds: 30 url: https://github.com/theakshaypant/akshay-pac-test-repoComments when PLR failing


Comment with successful PLR
🤖 AI Assistance
AI assistance can be used for various tasks, such as code generation,
documentation, or testing.
Please indicate whether you have used AI assistance
for this PR and provide details if applicable.
Important
Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Claude noreply@anthropic.com
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.