Skip to content

fix: clean up orphaned instance records on enqueue failure (#686)#687

Open
poyrazK wants to merge 3 commits into
mainfrom
fix/issue-686-orphaned-instance
Open

fix: clean up orphaned instance records on enqueue failure (#686)#687
poyrazK wants to merge 3 commits into
mainfrom
fix/issue-686-orphaned-instance

Conversation

@poyrazK

@poyrazK poyrazK commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes database instance records being orphaned in a transient/starting state if the task queue enqueuing fails during instance launch.

Changes

  • Added database rollback (deletion of the created record via s.repo.Delete) in LaunchInstance when enqueuing the provisioning task returns an error.
  • Added similar database deletion rollback in LaunchInstanceWithOptions when enqueuing fails.
  • Fixed minor compilation errors in instance_test.go where repo.List calls were missing the tagFilter parameter.
  • Added unit tests in instance_unit_test.go and integration tests in instance_test.go to cover and verify database cleanup on enqueuing failure.

Test Plan

  • Unit tests added to verify that repo.Delete is invoked when task enqueuing fails.

  • Integration test added to verify that the instance is indeed removed from PostgreSQL database repository when the task queue fails.

  • Unit tests added/updated

  • Integration tests pass

  • Manual testing performed

Breaking Changes

None.

Related Issues

Fixes #686

Checklist

  • Code follows project style guidelines
  • Documentation updated (if needed)
  • Tests added/updated
  • All tests pass (make test for unit tests)
  • Swagger docs updated (if API changed)

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced instance provisioning error handling to prevent orphaned database records. The system now properly cleans up created instances if provisioning task enqueuing encounters errors, improving data consistency and reliability.

Review Change Stack

Copilot AI review requested due to automatic review settings May 20, 2026 11:25
@github-actions github-actions Bot added breaking-change bug Something isn't working documentation Improvements or additions to documentation size/xs area/services labels May 20, 2026
@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 27 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2904bf6f-4f27-4c90-a240-3ba367955464

📥 Commits

Reviewing files that changed from the base of the PR and between 26f0da5 and eb205a7.

📒 Files selected for processing (2)
  • internal/core/services/instance_test.go
  • internal/core/services/instance_unit_test.go
📝 Walkthrough

Walkthrough

This PR adds compensating cleanup to the instance service: when task enqueue fails after an instance is created in the database, the service now attempts to delete that record and logs any delete errors. Both LaunchInstance and LaunchInstanceWithOptions paths are fixed and covered by unit and integration tests.

Changes

Instance Enqueue Failure Cleanup

Layer / File(s) Summary
Enqueue failure cleanup in service methods
internal/core/services/instance.go
LaunchInstance and LaunchInstanceWithOptions now delete the newly created instance record from the database on enqueue failure, after rolling back quota reservation. Delete errors are logged but do not prevent returning the enqueue error.
Test double enqueue failure support
internal/core/services/instance_test.go
InMemoryTaskQueue test stub gains ShouldFail field to conditionally return a simulated enqueue error, enabling failure path testing in both unit and integration tests.
Integration test coverage
internal/core/services/instance_test.go
New TestInstanceServiceLaunchEnqueueFailure integration test validates both LaunchInstance and LaunchInstanceWithOptions return enqueue errors without creating instance records. Existing concurrency and faulty-repository tests update List calls to pass explicit nil filter for API consistency.
Unit test coverage
internal/core/services/instance_unit_test.go
New LaunchInstance and LaunchInstanceWithOptions unit test subcases verify that enqueue failures trigger instance deletion from the repository and return the expected error message, with mock expectations validating each call sequence.

🐰 When queues enqueue and fail to spin,
The database cleanup starts within,
Lost records find their way back home,
No orphans left to drift and roam! 🏗️

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely describes the main change: adding cleanup of orphaned instance records when task queue enqueue fails.
Linked Issues check ✅ Passed The pull request fully implements the requirements from issue #686: adds compensating transaction (s.repo.Delete) on enqueue failures in both LaunchInstance and LaunchInstanceWithOptions, and includes comprehensive unit and integration tests.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing issue #686; the test infrastructure updates (InMemoryTaskQueue.ShouldFail, repo.List nil filter) are necessary supporting changes for testing the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-686-orphaned-instance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/core/services/instance_unit_test.go (1)

224-261: ⚡ Quick win

Add one subtest for rollback delete failure.

These new cases validate rollback success, but not the branch where repo.Delete fails after enqueue failure. Add a case asserting the returned error remains the enqueue error (and delete failure is only logged) to prevent regression in compensating behavior.

Also applies to: 285-309

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/services/instance_unit_test.go` around lines 224 - 261, Add a
new subtest mirroring "EnqueueFailure" that simulates taskQueue.Enqueue
returning an error and repo.Delete also failing; call svc.LaunchInstance with
the same LaunchParams, set taskQueue.On("Enqueue",
...).Return(errors.New("enqueue error")) and repo.On("Delete",
...).Return(errors.New("delete error")), then assert the returned error contains
the original enqueue error (e.g., "failed to enqueue provisioning task") and not
the delete error, and verify repo.Delete was called but its error is not
propagated (only logged). Apply the same additional subtest pattern where the
other test block exists (the similar block around lines 285-309) to cover both
places.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/core/services/instance_test.go`:
- Around line 729-730: The test is silently ignoring errors from seeding the
instance type; replace the blank identifier on the call to itRepo.Create with
error handling: capture the returned error from itRepo.Create(ctx, defaultType)
and fail the test immediately if non-nil (e.g., use t.Fatalf or require.NoError)
so that failures to create the fixture (defaultType / testInstanceType) are
reported clearly and stop the test.

---

Nitpick comments:
In `@internal/core/services/instance_unit_test.go`:
- Around line 224-261: Add a new subtest mirroring "EnqueueFailure" that
simulates taskQueue.Enqueue returning an error and repo.Delete also failing;
call svc.LaunchInstance with the same LaunchParams, set taskQueue.On("Enqueue",
...).Return(errors.New("enqueue error")) and repo.On("Delete",
...).Return(errors.New("delete error")), then assert the returned error contains
the original enqueue error (e.g., "failed to enqueue provisioning task") and not
the delete error, and verify repo.Delete was called but its error is not
propagated (only logged). Apply the same additional subtest pattern where the
other test block exists (the similar block around lines 285-309) to cover both
places.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6ebaf28-91d8-4501-b0bb-103879876f40

📥 Commits

Reviewing files that changed from the base of the PR and between f0484e9 and 26f0da5.

📒 Files selected for processing (3)
  • internal/core/services/instance.go
  • internal/core/services/instance_test.go
  • internal/core/services/instance_unit_test.go

Comment thread internal/core/services/instance_test.go Outdated
Copilot AI review requested due to automatic review settings May 20, 2026 11:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/services breaking-change bug Something isn't working documentation Improvements or additions to documentation size/xs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRITICAL: Instance record orphaned when enqueue fails after DB create

2 participants