Skip to content

RUN-38777 - added suspendDefinition and supporting statuses#56

Open
TomB30 wants to merge 15 commits intomainfrom
RUN-38777-add-suspend-definition-and-statuses
Open

RUN-38777 - added suspendDefinition and supporting statuses#56
TomB30 wants to merge 15 commits intomainfrom
RUN-38777-add-suspend-definition-and-statuses

Conversation

@TomB30
Copy link
Copy Markdown

@TomB30 TomB30 commented May 4, 2026

What does this PR do?

Related issue(s)

Fixes #

Checklist

  • All commits are signed off with DCO (git commit -s)
  • New/modified files have SPDX license and copyright headers
  • Documentation updated (if applicable)
  • Tests pass (make check)
  • No proprietary or internal information included

Summary by CodeRabbit

  • New Features

    • Added three lifecycle states: Suspended, Suspending, Resuming
    • Components can declare suspend/resume action lists to drive manifest mutations
    • Status mappings now support matching suspend-related states
  • Validation

    • Validator now checks suspend/resume action expressions for safety, rejecting unsafe or unbounded constructs
  • Tests

    • New unit tests for suspend statuses, action-expression validation, ordering, and apply-suspend/resume behavior

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a suspend/resume lifecycle: three new ResourceStatus values (Resuming, Suspending, Suspended), a Component-level SuspendDefinition with ordered JQ suspend/resume action lists, CRD/schema and deepcopy updates, an action-expression JQ validator, and wiring/tests to apply and validate the actions.

Changes

Suspend/Resume Lifecycle Feature

Layer / File(s) Summary
CRD Schema
charts/karta/crds/run.ai_kartas.yaml
Added resuming, suspending, and suspended arrays to statusMappings for childComponents and rootComponent. Added suspendDefinition with required non-empty resumeActions and suspendActions arrays (atomic lists of JQ strings).
Go API Types
pkg/api/runai/v1alpha1/structure.go
ComponentDefinition gains SuspendDefinition *SuspendDefinition. New SuspendDefinition type with SuspendActions/ResumeActions. ResourceStatus enum extended with Resuming, Suspending, Suspended. StatusMappings adds Resuming, Suspending, Suspended matcher slices; Entries() updated to prioritize these entries first.
Deepcopy / Generated
pkg/api/runai/v1alpha1/zz_generated.deepcopy.go
Autogenerated deepcopy updated to deep-copy SuspendDefinition and new StatusMappings slices; added SuspendDefinition.DeepCopyInto/DeepCopy.
Validation Wiring
pkg/api/runai/v1alpha1/validation.go
KartaValidator.Validate() now calls validateSuspendDefinitions() which iterates components and validates each SuspendActions/ResumeActions entry via jq.ValidateActionExpression(), returning contextualized errors.
JQ Action Safety
pkg/jq/validation.go
New exported ValidateActionExpression(expr string) error and internal validateActionParsedJQ walk gojq AST to allow assignment/update operators while rejecting unsafe constructs (del, range, paths, recurse, walk, repeat, recursive descent ..).
JQ Execution API
pkg/jq/execution/interface.go, pkg/jq/execution/runner.go
Added Mutator interface and embedded it into Runner; implemented runner.Mutate(ctx, expr) delegating to existing assign logic. Mocks and tests updated.
Accessor / Component Wiring
pkg/resource/accessor.go, pkg/resource/component.go, pkg/resource/component_factory.go
Added ApplySuspendActions/ApplyResumeActions on accessor and writer interfaces, and Component.Suspend/Component.Resume plus HasSuspendDefinition() that call accessor methods and handle definition-not-found as a no-op.
Tests
pkg/jq/validation_test.go, pkg/jq/execution/*_test.go, pkg/api/runai/v1alpha1/validation_test.go, pkg/resource/*_test.go
Added coverage for ValidateActionExpression, runner.Mutate, status mapping entries and ordering, ExtractStatus phase mapping for suspend statuses, ApplySuspend/ApplyResume behavior, and component suspend/resume behavior.

Sequence Diagram

sequenceDiagram
  autonumber
  participant Comp as Component
  participant Acc as Accessor
  participant JQ as JQ Runner
  participant Obj as Resource Object
  rect rgba(135,206,235,0.5)
    Comp->>Acc: Suspend(ctx)
    Acc->>JQ: Mutate(ctx, suspendActions[0])
    JQ->>Obj: apply mutation (in-place)
    JQ-->>Acc: success / error
    alt more actions
      loop for each remaining suspendActions
        Acc->>JQ: Mutate(ctx, suspendActions[i])
        JQ->>Obj: apply mutation
        JQ-->>Acc: success / error
      end
    end
    Acc-->>Comp: success / DefinitionNotFoundError treated as nil or propagate error
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ronlv10
  • yuval-gr
  • rogirun
  • Isan-Rivkin
  • shaked-bouktus

"I hopped through YAML, Go, and JQ delight,
Suspend then resume—soft as moonlight.
Validators guard each careful action,
Statuses ordered, deepcopy in traction.
A rabbit cheers: code naps, then takes flight."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main additions: suspendDefinition and supporting statuses for suspend/resume lifecycle management.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch RUN-38777-add-suspend-definition-and-statuses

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/runai/v1alpha1/structure.go`:
- Around line 367-374: The Entries() comment promises "first match wins" but the
matching code still appends every match to MatchedStatuses; update the matching
loop so it short-circuits on the first successful StatusMatchEntry match instead
of accumulating all matches. Locate the code that iterates
StatusMappings.Entries() and appends into MatchedStatuses, change it to
break/return immediately after the first entry whose matcher matches (using the
StatusMatchEntry matcher/fields), ensuring suspend-related entries (e.g.,
ResumingStatus, SuspendingStatus, SuspendedStatus) take precedence as intended.
- Around line 65-77: Add a minimum-item constraint and non-empty-entry
validation for SuspendActions and ResumeActions: annotate both SuspendActions
and ResumeActions with +kubebuilder:validation:MinItems=1 to prevent empty
arrays, and update the resource validation methods (e.g., ValidateCreate and
ValidateUpdate for the type declared in this file) to iterate over
SuspendActions and ResumeActions and return an error if any entry is the empty
string (""), rejecting blank entries in the validator.

In `@pkg/jq/validation.go`:
- Around line 220-230: ValidateActionExpression currently accepts plain
selectors; update validation to require at least one assignment/update operator
in the parsed JQ AST before accepting an action. Modify validateActionParsedJQ
(or add a helper it calls) to traverse the gojq parsed AST for assignment/update
nodes (e.g., operators such as =, |=, +=, -=, //=, etc.) and return an error
when none are found; keep the existing parse error handling in
ValidateActionExpression and call the new check after gojq.Parse returns the
parsed AST.
- Around line 235-270: The validator currently only recurses into Func.Args and
binary Op Left/Right inside validateActionParsedJQ; extend it to walk all Term
fields that can contain nested *Query to close AST traversal gaps: when q.Term
!= nil, after handling Func and TermType, recursively call
validateActionParsedJQ on Term.Array elements, Term.Object field values, Term.If
(Cond, Then, Else), Term.Try body, Term.Reduce subqueries (init/accum/body as
present), Term.Foreach subqueries (init/update/result as present), Term.Query
nested Query fields, and Term.SuffixList entries; also ensure Object.Fields and
SuffixList slices are iterated and each contained *Query passed to
validateActionParsedJQ so constructs like If, Array, Object, Try, Reduce,
Foreach, Query and SuffixList cannot bypass the blacklist.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 20b0f8bb-1dc5-4eda-9955-953c17eb03e3

📥 Commits

Reviewing files that changed from the base of the PR and between 87c2ae8 and ea3efc2.

📒 Files selected for processing (8)
  • charts/karta/crds/run.ai_kartas.yaml
  • pkg/api/runai/v1alpha1/structure.go
  • pkg/api/runai/v1alpha1/validation.go
  • pkg/api/runai/v1alpha1/validation_test.go
  • pkg/api/runai/v1alpha1/zz_generated.deepcopy.go
  • pkg/jq/validation.go
  • pkg/jq/validation_test.go
  • pkg/resource/accessor_test.go

Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/jq/validation.go Outdated
Comment thread pkg/jq/validation.go Outdated
TomB30 and others added 2 commits May 5, 2026 08:57
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: TomB30 <82462817+TomB30@users.noreply.github.com>
Extend validateActionParsedJQ to recursively walk every *Query-bearing
field inside gojq Term nodes: Array.Query, Object KeyQuery/Val,
If/Elif/Else branches, Try Body/Catch, Reduce and Foreach sub-expressions,
parenthesized Term.Query, and both Term.Index and SuffixList index
Start/End bounds.

Previously, constructs like [del(.x)], {a: del(.x)}, (recurse),
try del(.x), reduce/foreach bodies, and index expressions such as
.[range(5)] could bypass the blacklist entirely.

Add 25 new test entries that prove each bypass path is now caught.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/jq/validation.go (1)

376-388: 💤 Low value

Add explicit traversal of Term.Bind.Body and Term.Label.Body for complete AST coverage.

While variable bindings (. as $x | …) and labels (label $out | …) typically involve pipes at the top level and are thus covered by binary operator traversal, explicit traversal of these Term fields would close a minor gap and improve defense in depth. gojq v0.12.18 defines these fields with Body sub-queries that should be validated.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/jq/validation.go` around lines 376 - 388, The validateActionParsedJQ
function currently validates binary operator children but misses explicit
traversal of Term.Bind.Body and Term.Label.Body; update validateActionParsedJQ
to detect when q.Term is a Bind or Label and iterate over each sub-query in
Term.Bind.Body and Term.Label.Body respectively, calling validateActionParsedJQ
on each element so those nested bodies are validated as well (referencing
q.Term, Term.Bind.Body, Term.Label.Body and the validateActionParsedJQ
function).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/jq/validation_test.go`:
- Around line 571-572: Add a new test in pkg/jq/validation_test.go to cover
string interpolation bypass after fixing Term.Str traversal: add a DescribeTable
that calls ValidateActionExpression for expressions like `"value: \(del(.x))"`
and `"count: \(range(5))"`, asserting an error occurs and the message contains
the expected substring (e.g., "del function is not allowed" and "function
'range'"), so that ValidateActionExpression rejects dangerous functions embedded
in interpolated strings; reference ValidateActionExpression and Term.Str in case
you need to ensure the validator visits Term.Str nodes.

In `@pkg/jq/validation.go`:
- Around line 245-375: The validator is missing traversal of string
interpolation expressions (q.Term.Str.Queries), so add validation for each
interpolated query by iterating over q.Term.Str.Queries and calling
validateActionParsedJQ on each; place this block after the Term.Index handling
(i.e., after checking q.Term.Index.Start/End) inside the if q.Term != nil
branch, and ensure any returned error is propagated just like other checks in
validateActionParsedJQ.

---

Nitpick comments:
In `@pkg/jq/validation.go`:
- Around line 376-388: The validateActionParsedJQ function currently validates
binary operator children but misses explicit traversal of Term.Bind.Body and
Term.Label.Body; update validateActionParsedJQ to detect when q.Term is a Bind
or Label and iterate over each sub-query in Term.Bind.Body and Term.Label.Body
respectively, calling validateActionParsedJQ on each element so those nested
bodies are validated as well (referencing q.Term, Term.Bind.Body,
Term.Label.Body and the validateActionParsedJQ function).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 5e80d480-5293-4303-ba4f-9770ced1f552

📥 Commits

Reviewing files that changed from the base of the PR and between fdeae82 and 8b9e486.

📒 Files selected for processing (2)
  • pkg/jq/validation.go
  • pkg/jq/validation_test.go

Comment thread pkg/jq/validation_test.go Outdated
Comment thread pkg/jq/validation.go
Tom Bechar and others added 2 commits May 5, 2026 09:35
Layer 1 — pkg/jq/execution:
- Add Mutator interface with Mutate(ctx, expr) for self-contained mutation
  expressions (e.g. ".spec.suspend = true"); Runner now embeds Mutator
- Implement Mutate on runner via assignWithExpression with no variables
- Update MockRunner/add MockMutator; add missing Ginkgo suite bootstrap
- Add tests for Mutate; fix pre-existing Assign error-type assertion

Layer 2 — pkg/resource/accessor:
- Add ApplySuspendActions and ApplyResumeActions; each chains Mutate calls
  over SuspendDefinition.SuspendActions / ResumeActions in order

Layer 3 — pkg/resource/component_factory:
- Extend ComponentWriter interface and regenerate mocks for both new methods

Layer 4 — pkg/resource/component:
- Add HasSuspendDefinition, Suspend, Resume; no-op on DefinitionNotFoundError
- Tests cover happy path, no-definition no-op, and error propagation

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/jq/execution/runner.go`:
- Around line 90-92: Mutate currently forwards any expression to
assignWithExpression which accepts multiple emitted results; change Mutate to
enforce single-result semantics by validating the expression emits exactly one
value and returning an error if it emits zero or more than one. Locate
runner.Mutate and either call a new helper (e.g.,
assignSingleResultWithExpression) or add a flag/branch in assignWithExpression
to collect all emitted values for the given expression, ensure len(results) ==
1, and only then perform the update; return a clear error from Mutate when
result count != 1. Use the existing symbols runner.Mutate and
assignWithExpression to find where to add this validation.

In `@pkg/resource/accessor.go`:
- Line 359: The function declaration for ExtractInstanceIds is malformed because
the opening brace and the if statement are on the same line; update the
declaration so the opening brace is followed by a newline and the if block
starts on the next line (i.e., change "func (a *Accessor) ExtractInstanceIds(...
) {	if definition.InstanceIdPath == nil {" to separate the brace and the if),
ensuring the code around ExtractInstanceIds and the reference to
definition.InstanceIdPath is properly formatted and passes gofmt.

In `@pkg/resource/component.go`:
- Line 340: The declaration of HasPodDefinition has a gofmt violation because
the opening brace and the first statement are on the same line; update the
function signature and body formatting in the HasPodDefinition method so the
opening brace is followed by a newline and the first statement is on its own
indented line (e.g., change "func (c *Component) HasPodDefinition() bool {	spec
:= c.definition.SpecDefinition" to put "spec := c.definition.SpecDefinition" on
the next line with proper indentation), ensuring the function body conforms to
standard Go formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a6f8f97e-9fda-49fb-8bcb-028f3a2aadfc

📥 Commits

Reviewing files that changed from the base of the PR and between fcfebe3 and 8f18008.

📒 Files selected for processing (11)
  • pkg/jq/execution/interface.go
  • pkg/jq/execution/runner.go
  • pkg/jq/execution/runner_mock.go
  • pkg/jq/execution/runner_test.go
  • pkg/jq/execution/suite_test.go
  • pkg/resource/accessor.go
  • pkg/resource/accessor_mock.go
  • pkg/resource/accessor_test.go
  • pkg/resource/component.go
  • pkg/resource/component_factory.go
  • pkg/resource/component_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/resource/accessor_mock.go

Comment thread pkg/jq/execution/runner.go Outdated
Comment thread pkg/resource/accessor.go Outdated
Comment thread pkg/resource/component.go Outdated
Tom Bechar and others added 2 commits May 5, 2026 10:33
Dangerous functions embedded in interpolated strings like "\(del(.x))"
bypassed validateActionParsedJQ because Term.Str.Queries was never visited.
Add a loop over Str.Queries so each interpolated segment is recursed into,
and add two test entries confirming del and range are now caught.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread pkg/api/runai/v1alpha1/structure.go
Comment thread pkg/api/runai/v1alpha1/structure.go
Comment thread pkg/api/runai/v1alpha1/validation.go Outdated
Comment thread pkg/api/runai/v1alpha1/validation.go Outdated
Comment thread pkg/jq/validation.go Outdated
Comment thread pkg/jq/validation.go Outdated
Copy link
Copy Markdown

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, still reviewing 🙏

Comment thread charts/karta/crds/run.ai_kartas.yaml
Comment thread pkg/jq/execution/interface.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/api/runai/v1alpha1/structure.go Outdated
Comment thread pkg/jq/execution/runner_test.go Outdated
Comment thread pkg/jq/execution/runner_test.go Outdated
Comment thread pkg/jq/execution/runner_test.go Outdated
Comment thread pkg/resource/accessor.go Outdated
Comment thread pkg/resource/accessor.go Outdated
Comment thread pkg/jq/execution/runner.go Outdated
return r.assignWithExpression(ctx, updateExpression, []string{"$val"}, []any{values})
}

func (r *runner) Mutate(ctx context.Context, expression string) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding here for visibility we talked f2f: why Mutate is needed VS using Assign

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Basically, this is exactly what we discussed f2f - it simplifies the use of suspend for users who consume the expression directly and want to execute it.

In my opinion, we should keep it. If you’d prefer to limit the operators we support for the time being, I’m fine with that. If we had opted for the object-based approach with path and value, the assign might have been more convenient since it provides clear separation and spares the user from having to parse it. However, with the current structure, I think it’s more practical to inject the suspendAction directly into the mutate.

I agree that this is relatively "open," but it’s the user's responsibility. Besides, we already have our own safeguards in place against actions that could potentially crash the system.

Comment thread pkg/resource/suspend_resume_integration_test.go Outdated
Comment thread pkg/jq/validation.go Outdated
Comment thread pkg/jq/validation.go Outdated
Comment thread pkg/jq/validation_test.go
@TomB30 TomB30 requested review from Isan-Rivkin, rogirun and yuval-gr May 6, 2026 13:00
…ume actions

- Remove Mutator interface and Mutate method from Runner; suspend/resume
  actions now use Assign after parsing the = expression
- Add parseAssignAction + findAssignOpIndex helpers in accessor.go to split
  a "PATH = VALUE" string into path and Go value via gojq
- Tighten ValidateActionExpression to only accept the = operator; all other
  update operators (|=, +=, etc.) are now rejected at validation time
- Simplify ValidateParsedJQ: drop allowMutation bool parameter, always
  enforce read-only rules
- Remove ValidateParsedJQ allowMutation=true tests; update
  ValidateActionExpression tests to reflect = -only contract
- Remove condition-mutation examples from integration tests (not realistic)
- Make validateParsedJQ public per reviewer request; add FuncDefs traversal
  to prevent blacklisted operations being hidden inside def bodies

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! asking "changing" purely for notification reason, I want to make sure I don't block you and make sure I review this PR when it's ready so please ask for "re-review" I'll get notified when it's ready for review since it's seems to be still WIP 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native suspend/resume support via SuspendDefinition manifest patching

4 participants