-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Bug: Race Condition in Artifact Cache Upload/Commit Operations
Summary:
A race condition exists between the upload and commit handlers in pkg/artifactcache/handler.go that can lead to data corruption or inconsistent cache state when multiple concurrent operations target the same cache entry.
Affected Code:
pkg/artifactcache/handler.go, lines 259-282 (upload handler)
Description:
The upload handler performs a time-of-check to time-of-use (TOCTOU) vulnerability:
- Opens DB, checks if
cache.Completeis false - Closes DB (line 269)
- Writes data to storage via
h.storage.Write()
Meanwhile, the commit handler can:
- Mark the same cache as
Complete = true - Finalize the cache entry
Race Condition Timeline:
Thread A (upload) | Thread B (commit)
--------------------------|---------------------------
Open DB |
Check cache.Complete=false|
Close DB |
| Open DB
| Set cache.Complete=true
| storage.Commit(cache.ID)
| Close DB
storage.Write(cache.ID) | β Writes to "completed" cache!
Impact:
- Data Corruption: Partial writes may be committed to a cache marked as complete
- Invariant Violation: The system assumes completed caches are immutable, but concurrent uploads can modify them
- Inconsistent State: Cache metadata shows "complete" while storage files are still being written
Reproduction:
The bug is most likely to manifest when:
- Multiple workflow steps upload chunks to the same cache simultaneously
- Network delays cause commit to execute during an in-progress upload
- High concurrency environments with many parallel actions
Suggested Fix:
Hold the DB connection open (or use proper locking) during the entire upload operation:
// PATCH /_apis/artifactcache/caches/:id
func (h *Handler) upload(w http.ResponseWriter, r *http.Request, params httprouter.Params) {
id, err := strconv.ParseUint(params.ByName("id"), 10, 64)
if err != nil {
h.responseJSON(w, r, 400, err)
return
}
db, err := h.openDB()
if err != nil {
h.responseJSON(w, r, 500, err)
return
}
defer db.Close() // β Keep open throughout
cache := &Cache{}
if err := db.Get(id, cache); err != nil {
if errors.Is(err, bolthold.ErrNotFound) {
h.responseJSON(w, r, 400, fmt.Errorf("cache %d: not reserved", id))
return
}
h.responseJSON(w, r, 500, err)
return
}
if cache.Complete {
h.responseJSON(w, r, 400, fmt.Errorf("cache %v %q: already complete", cache.ID, cache.Key))
return
}
// Re-check after write to catch concurrent commits
start, _, err := parseContentRange(r.Header.Get("Content-Range"))
if err != nil {
h.responseJSON(w, r, 400, err)
return
}
if err := h.storage.Write(cache.ID, start, r.Body); err != nil {
h.responseJSON(w, r, 500, err)
return
}
// Verify cache is still incomplete before finalizing
if err := db.Get(id, cache); err != nil {
h.responseJSON(w, r, 500, err)
return
}
if cache.Complete {
h.responseJSON(w, r, 409, fmt.Errorf("cache %v %q: was committed during upload", cache.ID, cache.Key))
return
}
cache.UsedAt = time.Now().Unix()
if err := db.Update(cache.ID, cache); err != nil {
h.responseJSON(w, r, 500, err)
return
}
h.responseJSON(w, r, 200)
}Alternatively, introduce a mutex per cache ID or use BoltDB's transaction isolation more effectively.
Environment:
- Analyzed in latest main branch (commit as of 2026-02-26)
- Applies to all versions using the artifact cache handler
Additional Notes:
This is a classic TOCTOU race condition. While the window may be small, in high-throughput CI/CD environments with concurrent workflow executions, it can lead to subtle corruption that's difficult to debug.
The same pattern should be audited in the commit handler to ensure it doesn't have similar issues with concurrent operations.