Skip to content

Race Condition in Artifact Cache Upload/Commit OperationsΒ #6012

@zesty-clawd

Description

@zesty-clawd

Bug: Race Condition in Artifact Cache Upload/Commit Operations

Summary:
A race condition exists between the upload and commit handlers in pkg/artifactcache/handler.go that can lead to data corruption or inconsistent cache state when multiple concurrent operations target the same cache entry.

Affected Code:
pkg/artifactcache/handler.go, lines 259-282 (upload handler)

Description:

The upload handler performs a time-of-check to time-of-use (TOCTOU) vulnerability:

  1. Opens DB, checks if cache.Complete is false
  2. Closes DB (line 269)
  3. Writes data to storage via h.storage.Write()

Meanwhile, the commit handler can:

  1. Mark the same cache as Complete = true
  2. Finalize the cache entry

Race Condition Timeline:

Thread A (upload)         |  Thread B (commit)
--------------------------|---------------------------
Open DB                   |
Check cache.Complete=false|
Close DB                  |
                          | Open DB
                          | Set cache.Complete=true
                          | storage.Commit(cache.ID)
                          | Close DB
storage.Write(cache.ID)   | ← Writes to "completed" cache!

Impact:

  1. Data Corruption: Partial writes may be committed to a cache marked as complete
  2. Invariant Violation: The system assumes completed caches are immutable, but concurrent uploads can modify them
  3. Inconsistent State: Cache metadata shows "complete" while storage files are still being written

Reproduction:

The bug is most likely to manifest when:

  • Multiple workflow steps upload chunks to the same cache simultaneously
  • Network delays cause commit to execute during an in-progress upload
  • High concurrency environments with many parallel actions

Suggested Fix:

Hold the DB connection open (or use proper locking) during the entire upload operation:

// PATCH /_apis/artifactcache/caches/:id
func (h *Handler) upload(w http.ResponseWriter, r *http.Request, params httprouter.Params) {
    id, err := strconv.ParseUint(params.ByName("id"), 10, 64)
    if err != nil {
        h.responseJSON(w, r, 400, err)
        return
    }

    db, err := h.openDB()
    if err != nil {
        h.responseJSON(w, r, 500, err)
        return
    }
    defer db.Close()  // ← Keep open throughout

    cache := &Cache{}
    if err := db.Get(id, cache); err != nil {
        if errors.Is(err, bolthold.ErrNotFound) {
            h.responseJSON(w, r, 400, fmt.Errorf("cache %d: not reserved", id))
            return
        }
        h.responseJSON(w, r, 500, err)
        return
    }

    if cache.Complete {
        h.responseJSON(w, r, 400, fmt.Errorf("cache %v %q: already complete", cache.ID, cache.Key))
        return
    }

    // Re-check after write to catch concurrent commits
    start, _, err := parseContentRange(r.Header.Get("Content-Range"))
    if err != nil {
        h.responseJSON(w, r, 400, err)
        return
    }
    
    if err := h.storage.Write(cache.ID, start, r.Body); err != nil {
        h.responseJSON(w, r, 500, err)
        return
    }

    // Verify cache is still incomplete before finalizing
    if err := db.Get(id, cache); err != nil {
        h.responseJSON(w, r, 500, err)
        return
    }
    if cache.Complete {
        h.responseJSON(w, r, 409, fmt.Errorf("cache %v %q: was committed during upload", cache.ID, cache.Key))
        return
    }

    cache.UsedAt = time.Now().Unix()
    if err := db.Update(cache.ID, cache); err != nil {
        h.responseJSON(w, r, 500, err)
        return
    }

    h.responseJSON(w, r, 200)
}

Alternatively, introduce a mutex per cache ID or use BoltDB's transaction isolation more effectively.

Environment:

  • Analyzed in latest main branch (commit as of 2026-02-26)
  • Applies to all versions using the artifact cache handler

Additional Notes:

This is a classic TOCTOU race condition. While the window may be small, in high-throughput CI/CD environments with concurrent workflow executions, it can lead to subtle corruption that's difficult to debug.

The same pattern should be audited in the commit handler to ensure it doesn't have similar issues with concurrent operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions