chore(ci): add Docker Hub auth and layer caching#14676
chore(ci): add Docker Hub auth and layer caching#14676davidkonigsberg merged 17 commits intomainfrom
Conversation
- Add docker/login-action@v3 with DOCKERHUB_USERNAME/DOCKERHUB_TOKEN secrets - Add actions/cache@v4 for Docker layer caching keyed on runner.os and Dockerfiles - Apply to test-ete job in ci.yml - Apply to all 19 seed test jobs in seed.yml via cached-seed composite action - Apply to benchmark job in seed.yml - Apply to all 19 seed update jobs in update-seed.yml via auto-update-seed composite action - Thread Docker credentials through run-seed-process and auto-update-seed composite actions Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
| restore-keys: | | ||
| ${{ runner.os }}-docker- | ||
|
|
||
| - name: Load cached Docker images |
There was a problem hiding this comment.
Docker image cache poisoning via persistent /tmp/docker-cache on self-hosted runners
The Load cached Docker images step unconditionally loads every .tar file found in /tmp/docker-cache/ via docker load -i "$f" without first clearing that directory. On the self-hosted runners used in this repo (Test for test-ete in ci.yml, Seed for seed update jobs in update-seed.yml), /tmp/docker-cache/ persists across workflow runs. actions/cache@v4 does not wipe the target directory before restoring — on a cache miss the old files remain; on a cache hit only the archived files are extracted, leaving any extra stale files intact.
This means a prior CI run that introduced a Docker image with the same name as a legitimate generator image (e.g. fernapi/fern-java-sdk:latest) would have that image persisted in /tmp/docker-cache/ across all future runs on the same runner. Because the "Save Docker images for cache" step captures all images on the runner (docker images --format ...), not only those pulled during the current job, the set of persisted images grows with every run.
A same-repo PR author (GitHub Actions runs CI workflows from the PR branch for non-fork PRs) can modify the CI workflow to pull or build a malicious image, tag it with the name of any generator image, and let the "Save" step write it to /tmp/docker-cache/. Even after that PR is closed without merging, the .tar file remains on the self-hosted runner. The next legitimate CI run on that runner loads the malicious image via docker load, and when the seed tests execute generator containers they run the attacker's code, gaining access to CI secrets such as FERN_ORG_TOKEN_DEV and DOCKERHUB_TOKEN.
Prompt To Fix With AI
Three changes are needed together:
1. **Pre-clean `/tmp/docker-cache/` before restoring** — add a step before `actions/cache@v4` that removes the directory so only the explicitly-cached content is loaded:
```yaml
- name: Clear stale Docker image cache
shell: bash
run: rm -rf /tmp/docker-cache
```
2. **Load only images present in the restored cache, not stale leftovers** — the current guard (`if [ -d /tmp/docker-cache ]`) is not enough; add an explicit check that the cache was actually restored before running `docker load`. One way is to gate the load step on the `cache-hit` output of `actions/cache@v4`:
```yaml
- name: Cache Docker images
id: docker-cache
uses: actions/cache@v4
...
- name: Load cached Docker images
if: steps.docker-cache.outputs.cache-hit == 'true'
shell: bash
run: |
for f in /tmp/docker-cache/*.tar; do
[ -f "$f" ] && docker load -i "$f" || true
done
```
3. **Save only images pulled during the current job, not all images on the runner** — capture image IDs before the job starts and diff afterward, or restrict saving to a known set of images:
```yaml
- name: Record pre-existing Docker images
shell: bash
run: docker images -q > /tmp/docker-images-before.txt
# ... seed test runs here ...
- name: Save Docker images for cache
if: always()
shell: bash
run: |
mkdir -p /tmp/docker-cache
# Only save images that weren't present before this job
comm -13 <(sort /tmp/docker-images-before.txt) <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done
```
Apply the same three changes to the inline steps in `.github/workflows/ci.yml` (`test-ete` job) and `.github/workflows/seed.yml` (`benchmark` job).Severity: medium | Confidence: 72%
Fixed in 7146ddbd.
There was a problem hiding this comment.
Addressed all three recommendations in commit 7146ddb:
- Pre-clean
/tmp/docker-cache/— addedrm -rf /tmp/docker-cachestep beforeactions/cache@v4restore - Gate load on
cache-hit— load step now only runs whensteps.docker-cache.outputs.cache-hit == 'true' - Save only newly-pulled images — record image IDs before the job runs, then
comm -13to diff and only save images that weren't pre-existing
Also added a conditional guard (if: secrets.DOCKER_USERNAME_PUBLIC_READONLY != '') on the direct Docker login steps in ci.yml and seed.yml so they won't fail if secrets aren't configured.
…BLIC_READONLY secrets Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Clear /tmp/docker-cache before restoring to prevent stale files - Gate docker load on cache-hit output to skip on cache misses - Record pre-existing images and only save newly-pulled images - Add conditional guard on Docker login for missing secrets Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| - name: Save Docker images for cache | ||
| if: always() | ||
| shell: bash | ||
| run: | | ||
| mkdir -p /tmp/docker-cache | ||
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | ||
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | ||
| [ -z "$img" ] && continue | ||
| fname=$(echo "$img" | tr '/:' '_') | ||
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | ||
| done |
There was a problem hiding this comment.
The cache save step runs with if: always() but depends on /tmp/docker-images-before.txt which may not exist if the "Record pre-existing Docker images" step was skipped or failed. This will cause comm to fail with "No such file or directory", potentially failing the entire job during cleanup.
Fix:
- name: Save Docker images for cache
if: always()
shell: bash
run: |
if [ ! -f /tmp/docker-images-before.txt ]; then
echo "Skipping cache save: docker-images-before.txt not found"
exit 0
fi
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done| - name: Save Docker images for cache | |
| if: always() | |
| shell: bash | |
| run: | | |
| mkdir -p /tmp/docker-cache | |
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | |
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | |
| [ -z "$img" ] && continue | |
| fname=$(echo "$img" | tr '/:' '_') | |
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | |
| done | |
| - name: Save Docker images for cache | |
| if: always() | |
| shell: bash | |
| run: | | |
| if [ ! -f /tmp/docker-images-before.txt ]; then | |
| echo "Skipping cache save: docker-images-before.txt not found" | |
| exit 0 | |
| fi | |
| mkdir -p /tmp/docker-cache | |
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | |
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | |
| [ -z "$img" ] && continue | |
| fname=$(echo "$img" | tr '/:' '_') | |
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | |
| done | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Fixed in ebad0ea — added the guard to check for /tmp/docker-images-before.txt before running comm in all three locations (cached-seed action, ci.yml test-ete, seed.yml benchmark).
| - name: Save Docker images for cache | ||
| if: always() | ||
| run: | | ||
| mkdir -p /tmp/docker-cache | ||
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | ||
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | ||
| [ -z "$img" ] && continue | ||
| fname=$(echo "$img" | tr '/:' '_') | ||
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | ||
| done |
There was a problem hiding this comment.
The cache save step runs with if: always() but depends on /tmp/docker-images-before.txt which may not exist if the "Record pre-existing Docker images" step was skipped or failed. This will cause comm to fail, potentially failing the entire job during cleanup.
Fix:
- name: Save Docker images for cache
if: always()
run: |
if [ ! -f /tmp/docker-images-before.txt ]; then
echo "Skipping cache save: docker-images-before.txt not found"
exit 0
fi
mkdir -p /tmp/docker-cache
comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do
img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true)
[ -z "$img" ] && continue
fname=$(echo "$img" | tr '/:' '_')
docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true
done| - name: Save Docker images for cache | |
| if: always() | |
| run: | | |
| mkdir -p /tmp/docker-cache | |
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | |
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | |
| [ -z "$img" ] && continue | |
| fname=$(echo "$img" | tr '/:' '_') | |
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | |
| done | |
| - name: Save Docker images for cache | |
| if: always() | |
| run: | | |
| if [ ! -f /tmp/docker-images-before.txt ]; then | |
| echo "Skipping cache save: docker-images-before.txt not found" | |
| exit 0 | |
| fi | |
| mkdir -p /tmp/docker-cache | |
| comm -13 /tmp/docker-images-before.txt <(docker images -q | sort) | while read -r id; do | |
| img=$(docker inspect --format '{{index .RepoTags 0}}' "$id" 2>/dev/null || true) | |
| [ -z "$img" ] && continue | |
| fname=$(echo "$img" | tr '/:' '_') | |
| docker save "$img" -o "/tmp/docker-cache/${fname}.tar" 2>/dev/null || true | |
| done | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Fixed in ebad0ea — added the same guard to all three locations.
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…he keys Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
.github/workflows/ci.yml
Outdated
| - name: Record pre-existing Docker images | ||
| run: docker images -q | sort -u > /tmp/docker-images-before.txt | ||
|
|
||
| - name: Clear stale Docker image cache | ||
| run: rm -rf /tmp/docker-cache | ||
|
|
||
| - name: Cache Docker images | ||
| id: docker-cache | ||
| uses: actions/cache@v4 | ||
| with: | ||
| path: /tmp/docker-cache | ||
| key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-docker-ete- | ||
|
|
||
| - name: Load cached Docker images | ||
| if: steps.docker-cache.outputs.cache-hit == 'true' | ||
| run: | | ||
| for f in /tmp/docker-cache/*.tar; do | ||
| [ -f "$f" ] && docker load -i "$f" || true | ||
| done |
There was a problem hiding this comment.
Bug: Cache logic will re-save cached images on every run
The "Record pre-existing Docker images" step runs BEFORE "Load cached Docker images". This means cached images loaded in line 197-202 will NOT be in /tmp/docker-images-before.txt, causing them to be incorrectly identified as "new" images and re-saved in the cache save step (line 249-262).
Impact: Wastes CI time and storage re-saving the same cached images on every run.
Fix: Move the "Record pre-existing Docker images" step to AFTER the "Load cached Docker images" step:
- name: Clear stale Docker image cache
run: rm -rf /tmp/docker-cache
- name: Cache Docker images
id: docker-cache
uses: actions/cache@v4
with:
path: /tmp/docker-cache
key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }}
restore-keys: |
${{ runner.os }}-docker-ete-
- name: Load cached Docker images
if: steps.docker-cache.outputs.cache-hit == 'true'
run: |
for f in /tmp/docker-cache/*.tar; do
[ -f "$f" ] && docker load -i "$f" || true
done
- name: Record pre-existing Docker images
run: docker images -q | sort -u > /tmp/docker-images-before.txt| - name: Record pre-existing Docker images | |
| run: docker images -q | sort -u > /tmp/docker-images-before.txt | |
| - name: Clear stale Docker image cache | |
| run: rm -rf /tmp/docker-cache | |
| - name: Cache Docker images | |
| id: docker-cache | |
| uses: actions/cache@v4 | |
| with: | |
| path: /tmp/docker-cache | |
| key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }} | |
| restore-keys: | | |
| ${{ runner.os }}-docker-ete- | |
| - name: Load cached Docker images | |
| if: steps.docker-cache.outputs.cache-hit == 'true' | |
| run: | | |
| for f in /tmp/docker-cache/*.tar; do | |
| [ -f "$f" ] && docker load -i "$f" || true | |
| done | |
| - name: Clear stale Docker image cache | |
| run: rm -rf /tmp/docker-cache | |
| - name: Cache Docker images | |
| id: docker-cache | |
| uses: actions/cache@v4 | |
| with: | |
| path: /tmp/docker-cache | |
| key: ${{ runner.os }}-docker-ete-${{ hashFiles('**/Dockerfile*') }} | |
| restore-keys: | | |
| ${{ runner.os }}-docker-ete- | |
| - name: Load cached Docker images | |
| if: steps.docker-cache.outputs.cache-hit == 'true' | |
| run: | | |
| for f in /tmp/docker-cache/*.tar; do | |
| [ -f "$f" ] && docker load -i "$f" || true | |
| done | |
| - name: Record pre-existing Docker images | |
| run: docker images -q | sort -u > /tmp/docker-images-before.txt | |
Spotted by Graphite
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Good catch — fixed in a9a0dfd. Moved "Record pre-existing Docker images" to after "Load cached Docker images" in all three locations (cached-seed action, ci.yml test-ete, seed.yml benchmark) so that cached images are included in the baseline and won't be re-saved on every run.
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ached images Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…er images Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Description
Adds Docker Hub authentication and Docker layer caching across all CI jobs that pull Docker images. This mitigates intermittent failures caused by Docker Hub rate limiting on anonymous pulls.
Changes Made
ci.yml: Addeddocker/login-action@v3andactions/cache@v4steps to thetest-etejob, with security-hardened cache save/restore (only newly-pulled images are cached)seed.yml: Passeddockerhub-username/dockerhub-tokento all 18 seed test jobs viarun-seed-process; added direct Docker login + cache steps to thebenchmarkjobupdate-seed.yml: Passeddockerhub-username/dockerhub-tokento all 18 seed update jobs viaauto-update-seedcached-seed,run-seed-process,auto-update-seed): Addeddockerhub-usernameanddockerhub-tokeninputs (optional, default"") and threaded credentials through the call chain.cached-seedperforms the actual Docker login, cache restore, and cache save.Prerequisite:
DOCKER_USERNAME_PUBLIC_READONLYandDOCKER_PASSWORD_PUBLIC_READONLYmust be configured as repository secrets (public read-only Docker Hub access token).Security hardening (cache poisoning mitigation)
Addressed review feedback about Docker image cache poisoning on self-hosted runners:
rm -rf /tmp/docker-cachebeforeactions/cache@v4restore, so stale files from prior runs on self-hosted runners can't persistdocker loadloop runs unconditionally (nocache-hitgate) so that both exact andrestore-keyspartial matches are loaded. The[ -f "$f" ]check inside the loop safely handles the case where no tar files exist.comm -13to diff and only saves images that were pulled during the current job — not pre-existing or previously-cached imagesci.ymlandseed.ymlare guarded withif: secrets.DOCKER_USERNAME_PUBLIC_READONLY != ''so they won't fail if secrets aren't configured/tmp/docker-images-before.txtbefore runningcomm, gracefully skipping if the earlier record step was skipped or failed (addresses Graphite feedback)Updates since last revision
Deduplicate image IDs with
sort -u:docker images -qreturns duplicate IDs for multi-tagged images. Changed allsortcalls tosort -uto avoid redundantdocker savecalls for the same image.Per-generator cache keys: Replaced the single shared cache key (
runner.os-docker-...) with per-generator keys:cached-seedaction:runner.os-docker-{generator-name}-...seed.ymlbenchmark:runner.os-docker-{matrix.generator}-...ci.ymltest-ete:runner.os-docker-ete-...This ensures each generator caches only its own Docker images instead of all generators competing over a single cache entry.
Fixed step ordering: Moved "Record pre-existing Docker images" to after "Load cached Docker images" in all three inline cache locations. Previously, recording happened before loading, which meant cached images were not in the baseline and would be incorrectly re-saved on every run (addresses Graphite feedback).
Removed
cache-hitguard on load step:actions/cache@v4setscache-hittotrueonly on exact primary key matches. Whenrestore-keysproduces a partial match, files are restored butcache-hitisfalse— so the previousif: steps.docker-cache.outputs.cache-hit == 'true'guard maderestore-keysentirely ineffective. Removed the guard in all 3 locations; the[ -f "$f" ]check inside the loop already handles the no-files case safely (addresses Devin Review feedback).Step execution order (all 3 locations)
Review checklist for humans
hashFiles('**/Dockerfile*')scope: This glob matches all Dockerfiles across the entire monorepo. Verify this is the right granularity — if unrelated Dockerfile changes bust all generator caches, a narrower glob may be needed..tarfiles which can be large. Monitor cache usage after merging to ensure older entries aren't evicted too aggressively.docker inspect --format '{{index .RepoTags 0}}'only saves the first tag per image. If a generator image is referenced by multiple tags, only one will survive a cache round-trip. This is likely fine for pulled (not locally built) images but worth verifying.test-etejob runs onTestrunners and some seed-update jobs run onSeedrunners. Confirm that/tmpcleanup between jobs behaves as expected — the pre-clean step should handle it, but worth verifying.test-eteand seed jobs did not run on this PR, so Docker login has not been exercised yet. Verify after merge by checking whether the "Log in to Docker Hub" step showsLogin Succeededor is skipped.cache-hitguard removed, the load step always runs. When no cache exists at all,/tmp/docker-cache/won't exist (it was justrm -rf'd). The bash glob/tmp/docker-cache/*.tarwon't match, and the[ -f "$f" ]guard catches the literal unexpanded glob string. Verify this is safe on your runner OS/shell.Testing
js-yamlpnpm format(no changes)comm -13diffing,sort -udeduplication, multi-tag behavior,restore-keyspartial match behavior)Link to Devin session: https://app.devin.ai/sessions/427c37bbbafa4a03b0bcb0a47dcfeadf
Requested by: @Swimburger