Skip to content

chore(ci): add Docker Hub auth and layer caching#14676

Merged
davidkonigsberg merged 17 commits intomainfrom
devin/1775501837-docker-hub-auth-cache
Apr 7, 2026
Merged

chore(ci): add Docker Hub auth and layer caching#14676
davidkonigsberg merged 17 commits intomainfrom
devin/1775501837-docker-hub-auth-cache

Conversation

@Swimburger
Copy link
Copy Markdown
Member

@Swimburger Swimburger commented Apr 6, 2026

Description

Adds Docker Hub authentication and Docker layer caching across all CI jobs that pull Docker images. This mitigates intermittent failures caused by Docker Hub rate limiting on anonymous pulls.

Changes Made

  • ci.yml: Added docker/login-action@v3 and actions/cache@v4 steps to the test-ete job, with security-hardened cache save/restore (only newly-pulled images are cached)
  • seed.yml: Passed dockerhub-username/dockerhub-token to all 18 seed test jobs via run-seed-process; added direct Docker login + cache steps to the benchmark job
  • update-seed.yml: Passed dockerhub-username/dockerhub-token to all 18 seed update jobs via auto-update-seed
  • Composite actions (cached-seed, run-seed-process, auto-update-seed): Added dockerhub-username and dockerhub-token inputs (optional, default "") and threaded credentials through the call chain. cached-seed performs the actual Docker login, cache restore, and cache save.

Prerequisite: DOCKER_USERNAME_PUBLIC_READONLY and DOCKER_PASSWORD_PUBLIC_READONLY must be configured as repository secrets (public read-only Docker Hub access token).

Security hardening (cache poisoning mitigation)

Addressed review feedback about Docker image cache poisoning on self-hosted runners:

  1. Pre-clean cache dir: rm -rf /tmp/docker-cache before actions/cache@v4 restore, so stale files from prior runs on self-hosted runners can't persist
  2. Load always attempts with file-existence guard: The docker load loop runs unconditionally (no cache-hit gate) so that both exact and restore-keys partial matches are loaded. The [ -f "$f" ] check inside the loop safely handles the case where no tar files exist.
  3. Save only new images: Records image IDs after cache load (so cached images are in the baseline), then uses comm -13 to diff and only saves images that were pulled during the current job — not pre-existing or previously-cached images
  4. Conditional Docker login: Direct login steps in ci.yml and seed.yml are guarded with if: secrets.DOCKER_USERNAME_PUBLIC_READONLY != '' so they won't fail if secrets aren't configured
  5. Guard cache save against missing state file: The save step checks for /tmp/docker-images-before.txt before running comm, gracefully skipping if the earlier record step was skipped or failed (addresses Graphite feedback)

Updates since last revision

  • Deduplicate image IDs with sort -u: docker images -q returns duplicate IDs for multi-tagged images. Changed all sort calls to sort -u to avoid redundant docker save calls for the same image.

  • Per-generator cache keys: Replaced the single shared cache key (runner.os-docker-...) with per-generator keys:

    • cached-seed action: runner.os-docker-{generator-name}-...
    • seed.yml benchmark: runner.os-docker-{matrix.generator}-...
    • ci.yml test-ete: runner.os-docker-ete-...

    This ensures each generator caches only its own Docker images instead of all generators competing over a single cache entry.

  • Fixed step ordering: Moved "Record pre-existing Docker images" to after "Load cached Docker images" in all three inline cache locations. Previously, recording happened before loading, which meant cached images were not in the baseline and would be incorrectly re-saved on every run (addresses Graphite feedback).

  • Removed cache-hit guard on load step: actions/cache@v4 sets cache-hit to true only on exact primary key matches. When restore-keys produces a partial match, files are restored but cache-hit is false — so the previous if: steps.docker-cache.outputs.cache-hit == 'true' guard made restore-keys entirely ineffective. Removed the guard in all 3 locations; the [ -f "$f" ] check inside the loop already handles the no-files case safely (addresses Devin Review feedback).

Step execution order (all 3 locations)

Docker login → Clear stale cache dir → Cache restore → Load cached images → Record image baseline → [job work] → Save only new images

Review checklist for humans

  1. hashFiles('**/Dockerfile*') scope: This glob matches all Dockerfiles across the entire monorepo. Verify this is the right granularity — if unrelated Dockerfile changes bust all generator caches, a narrower glob may be needed.
  2. 10GB GitHub Actions cache limit: With per-generator keys, the number of cache entries is now ~18×. Each entry stores Docker image .tar files which can be large. Monitor cache usage after merging to ensure older entries aren't evicted too aggressively.
  3. Multi-tag image handling: docker inspect --format '{{index .RepoTags 0}}' only saves the first tag per image. If a generator image is referenced by multiple tags, only one will survive a cache round-trip. This is likely fine for pulled (not locally built) images but worth verifying.
  4. Self-hosted runner state: The test-ete job runs on Test runners and some seed-update jobs run on Seed runners. Confirm that /tmp cleanup between jobs behaves as expected — the pre-clean step should handle it, but worth verifying.
  5. Secrets availability: The test-ete and seed jobs did not run on this PR, so Docker login has not been exercised yet. Verify after merge by checking whether the "Log in to Docker Hub" step shows Login Succeeded or is skipped.
  6. No-cache-at-all scenario: With the cache-hit guard removed, the load step always runs. When no cache exists at all, /tmp/docker-cache/ won't exist (it was just rm -rf'd). The bash glob /tmp/docker-cache/*.tar won't match, and the [ -f "$f" ] guard catches the literal unexpanded glob string. Verify this is safe on your runner OS/shell.

Testing

  • YAML syntax validated locally via js-yaml
  • Formatting verified via pnpm format (no changes)
  • Edge cases analyzed locally (comm -13 diffing, sort -u deduplication, multi-tag behavior, restore-keys partial match behavior)
  • Actual Docker auth + caching can only be validated by running CI with the secrets configured

Link to Devin session: https://app.devin.ai/sessions/427c37bbbafa4a03b0bcb0a47dcfeadf
Requested by: @Swimburger


Open with Devin

- Add docker/login-action@v3 with DOCKERHUB_USERNAME/DOCKERHUB_TOKEN secrets
- Add actions/cache@v4 for Docker layer caching keyed on runner.os and Dockerfiles
- Apply to test-ete job in ci.yml
- Apply to all 19 seed test jobs in seed.yml via cached-seed composite action
- Apply to benchmark job in seed.yml
- Apply to all 19 seed update jobs in update-seed.yml via auto-update-seed composite action
- Thread Docker credentials through run-seed-process and auto-update-seed composite actions

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

restore-keys: |
${{ runner.os }}-docker-

- name: Load cached Docker images
Copy link
Copy Markdown

@hex-security-app hex-security-app bot Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Docker image cache poisoning via persistent /tmp/docker-cache on self-hosted runners

The Load cached Docker images step unconditionally loads every .tar file found in /tmp/docker-cache/ via docker load -i "$f" without first clearing that directory. On the self-hosted runners used in this repo (Test for test-ete in ci.yml, Seed for seed update jobs in update-seed.yml), /tmp/docker-cache/ persists across workflow runs. actions/cache@v4 does not wipe the target directory before restoring — on a cache miss the old files remain; on a cache hit only the archived files are extracted, leaving any extra stale files intact.

This means a prior CI run that introduced a Docker image with the same name as a legitimate generator image (e.g. fernapi/fern-java-sdk:latest) would have that image persisted in /tmp/docker-cache/ across all future runs on the same runner. Because the "Save Docker images for cache" step captures all images on the runner (docker images --format ...), not only those pulled during the current job, the set of persisted images grows with every run.

A same-repo PR author (GitHub Actions runs CI workflows from the PR branch for non-fork PRs) can modify the CI workflow to pull or build a malicious image, tag it with the name of any generator image, and let the "Save" step write it to /tmp/docker-cache/. Even after that PR is closed without merging, the .tar file remains on the self-hosted runner. The next legitimate CI run on that runner loads the malicious image via docker load, and when the seed tests execute generator containers they run the attacker's code, gaining access to CI secrets such as FERN_ORG_TOKEN_DEV and DOCKERHUB_TOKEN.

Prompt To Fix With AI
Three changes are needed together:

1. **Pre-clean `/tmp/docker-cache/` before restoring** — add a step before `actions/cache@v4` that removes the directory so only the explicitly-cached content is loaded:
```yaml
- name: Clear stale Docker image cache
  shell: bash
  run: rm -rf /tmp/docker-cache
```

2. **Load only images present in the restored cache, not stale leftovers** — the current guard (`if [ -d /tmp/docker-cache ]`) is not enough; add an explicit check that the cache was actually restored before running `docker load`. One way is to gate the load step on the `cache-hit` output of `actions/cache@v4`:
```yaml
- name: Cache Docker images
  id: docker-cache
  uses: actions/cache@v4
  ...

- name: Load cached Docker images
  if: steps.docker-cache.outputs.cache-hit == 'true'
  shell: bash
  run: |
    for f in /tmp/docker-cache/*.tar; do
      [ -f "$f" ] && docker load -i "$f" || true
    done
```

3. **Save only images pulled during the current job, not all images on the runner** — capture image IDs before the job starts and diff afterward, or restrict saving to a known set of images:
```yaml
- name: Record pre-existing Docker images
  shell: bash
  run: docker images -q > /tmp/docker-images-before.txt

# ... seed test runs here ...

- name: Save Docker images for cache
  if: always()
  shell: bash
  run: |
    mkdir -p /tmp/docker-cache
    # Only save images that weren't present before this job
    comm -13 <(sort /tmp/docker-images-before.txt) <(docker images -q | sort) | while read -r id; do
      img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
      [ -z "$img" ] && continue
      fname=$(echo "$img" | tr '/:' '_')
      docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
    done
```

Apply the same three changes to the inline steps in `.github/workflows/ci.yml` (`test-ete` job) and `.github/workflows/seed.yml` (`benchmark` job).

Severity: medium | Confidence: 72%

Fixed in 7146ddbd.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all three recommendations in commit 7146ddb:

  1. Pre-clean /tmp/docker-cache/ — added rm -rf /tmp/docker-cache step before actions/cache@v4 restore
  2. Gate load on cache-hit — load step now only runs when steps.docker-cache.outputs.cache-hit == 'true'
  3. Save only newly-pulled images — record image IDs before the job runs, then comm -13 to diff and only save images that weren't pre-existing

Also added a conditional guard (if: secrets.DOCKER_USERNAME_PUBLIC_READONLY != '') on the direct Docker login steps in ci.yml and seed.yml so they won't fail if secrets aren't configured.

Swimburger and others added 2 commits April 6, 2026 19:11
…BLIC_READONLY secrets

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Clear /tmp/docker-cache before restoring to prevent stale files
- Gate docker load on cache-hit output to skip on cache misses
- Record pre-existing images and only save newly-pulled images
- Add conditional guard on Docker login for missing secrets

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Comment on lines +99 to +109
- name: Save Docker images for cache
if: always()
shell: bash
run: |
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache save step runs with if: always() but depends on /tmp/docker-images-before.txt which may not exist if the "Record pre-existing Docker images" step was skipped or failed. This will cause comm to fail with "No such file or directory", potentially failing the entire job during cleanup.

Fix:

- name: Save Docker images for cache
  if: always()
  shell: bash
  run: |
    if [ ! -f /tmp/docker-images-before.txt ]; then
      echo "Skipping cache save: docker-images-before.txt not found"
      exit 0
    fi
    mkdir -p /tmp/docker-cache
    comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
      img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
      [ -z "$img" ] && continue
      fname=$(echo "$img" | tr '/:' '_')
      docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
    done
Suggested change
- name: Save Docker images for cache
if: always()
shell: bash
run: |
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done
- name: Save Docker images for cache
if: always()
shell: bash
run: |
if [ ! -f /tmp/docker-images-before.txt ]; then
echo "Skipping cache save: docker-images-before.txt not found"
exit 0
fi
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ebad0ea — added the guard to check for /tmp/docker-images-before.txt before running comm in all three locations (cached-seed action, ci.yml test-ete, seed.yml benchmark).

Comment on lines +249 to +258
- name: Save Docker images for cache
if: always()
run: |
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache save step runs with if: always() but depends on /tmp/docker-images-before.txt which may not exist if the "Record pre-existing Docker images" step was skipped or failed. This will cause comm to fail, potentially failing the entire job during cleanup.

Fix:

- name: Save Docker images for cache
  if: always()
  run: |
    if [ ! -f /tmp/docker-images-before.txt ]; then
      echo "Skipping cache save: docker-images-before.txt not found"
      exit 0
    fi
    mkdir -p /tmp/docker-cache
    comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
      img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
      [ -z "$img" ] && continue
      fname=$(echo "$img" | tr '/:' '_')
      docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
    done
Suggested change
- name: Save Docker images for cache
if: always()
run: |
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done
- name: Save Docker images for cache
if: always()
run: |
if [ ! -f /tmp/docker-images-before.txt ]; then
echo "Skipping cache save: docker-images-before.txt not found"
exit 0
fi
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ebad0ea — added the same guard to all three locations.

Swimburger and others added 2 commits April 6, 2026 19:27
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…he keys

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Comment on lines +182 to +202
- name: Record pre-existing Docker images
run: docker images -q | sort -u > /tmp/docker-images-before.txt

- name: Clear stale Docker image cache
run: rm -rf /tmp/docker-cache

- name: Cache Docker images
id: docker-cache
uses: actions/cache@v4
with:
path: /tmp/docker-cache
key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }}
restore-keys: |
${{ runner.os }}-docker-ete-

- name: Load cached Docker images
if: steps.docker-cache.outputs.cache-hit == 'true'
run: |
for f in /tmp/docker-cache/*.tar; do
[ -f "$f" ] && docker load -i "$f" || true
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Cache logic will re-save cached images on every run

The "Record pre-existing Docker images" step runs BEFORE "Load cached Docker images". This means cached images loaded in line 197-202 will NOT be in /tmp/docker-images-before.txt, causing them to be incorrectly identified as "new" images and re-saved in the cache save step (line 249-262).

Impact: Wastes CI time and storage re-saving the same cached images on every run.

Fix: Move the "Record pre-existing Docker images" step to AFTER the "Load cached Docker images" step:

- name: Clear stale Docker image cache
  run: rm -rf /tmp/docker-cache

- name: Cache Docker images
  id: docker-cache
  uses: actions/cache@v4
  with:
    path: /tmp/docker-cache
    key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }}
    restore-keys: |
      ${{ runner.os }}-docker-ete-

- name: Load cached Docker images
  if: steps.docker-cache.outputs.cache-hit == 'true'
  run: |
    for f in /tmp/docker-cache/*.tar; do
      [ -f "$f" ] && docker load -i "$f" || true
    done

- name: Record pre-existing Docker images
  run: docker images -q | sort -u > /tmp/docker-images-before.txt
Suggested change
- name: Record pre-existing Docker images
run: docker images -q | sort -u > /tmp/docker-images-before.txt
- name: Clear stale Docker image cache
run: rm -rf /tmp/docker-cache
- name: Cache Docker images
id: docker-cache
uses: actions/cache@v4
with:
path: /tmp/docker-cache
key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }}
restore-keys: |
${{ runner.os }}-docker-ete-
- name: Load cached Docker images
if: steps.docker-cache.outputs.cache-hit == 'true'
run: |
for f in /tmp/docker-cache/*.tar; do
[ -f "$f" ] && docker load -i "$f" || true
done
- name: Clear stale Docker image cache
run: rm -rf /tmp/docker-cache
- name: Cache Docker images
id: docker-cache
uses: actions/cache@v4
with:
path: /tmp/docker-cache
key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }}
restore-keys: |
${{ runner.os }}-docker-ete-
- name: Load cached Docker images
if: steps.docker-cache.outputs.cache-hit == 'true'
run: |
for f in /tmp/docker-cache/*.tar; do
[ -f "$f" ] && docker load -i "$f" || true
done
- name: Record pre-existing Docker images
run: docker images -q | sort -u > /tmp/docker-images-before.txt

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in a9a0dfd. Moved "Record pre-existing Docker images" to after "Load cached Docker images" in all three locations (cached-seed action, ci.yml test-ete, seed.yml benchmark) so that cached images are included in the baseline and won't be re-saved on every run.

Swimburger and others added 3 commits April 6, 2026 20:15
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ached images

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@Swimburger Swimburger enabled auto-merge (squash) April 6, 2026 20:45
Swimburger and others added 2 commits April 6, 2026 20:54
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@davidkonigsberg davidkonigsberg disabled auto-merge April 7, 2026 16:07
@davidkonigsberg davidkonigsberg merged commit 7ecb9ee into main Apr 7, 2026
6 checks passed
@davidkonigsberg davidkonigsberg deleted the devin/1775501837-docker-hub-auth-cache branch April 7, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants