Summary
Helios's CI/CD deploy action dispatched the prod-like-deployment.yml workflow on ls1intum/Artemis with a stale commit_sha input — the previous develop HEAD, not the current one. The workflow obediently fetched the build artifact for that older commit and deployed it, so the staging1 nodes rolled forward to yesterday's WAR instead of the just-merged PR. The deploy reported success; the smoke tests caught it because the running JVM's git.commit.id did not match the SHA we asked Helios to deploy.
Concrete instance (today, 2026-05-25)
- PR ls1intum/Artemis#12711 was merged into
develop at 10:45:16Z. Merge commit on develop: 1258484214d47fe42b9a370b1dc860a6aba836fd.
- The
build.yml workflow for that commit completed success at 10:57:07Z and produced an Artemis.war artifact (run 26396508993).
- UI signal: when opening Helios' CI/CD page for the Artemis repo, the new build for
12584842 was not visible for several minutes (suspected list caching).
- Triggering "Deploy to staging1" from Helios at
11:01:09Z produced Artemis deploy run 26397117774. From its Print inputs step log:
Branch: develop
Commit SHA: 01be782943782cb39d644d5c529223e873c8c51e ← previous develop HEAD (PR #12778), NOT 12584842
Environment: staging1.artemis.cit.tum.de
Triggered by: krusche
- The workflow's
check-build-status job resolved this SHA to build run 26371588020 (yesterday's build of 01be782), downloaded that artifact, and ansible rolled it to all three staging1 core nodes.
- After the deploy:
systemctl is-active artemis → active on all 6 hosts.
/management/health → 200 UP.
/management/info → git.commit.id → 01be782943782cb39d644d5c529223e873c8c51e (yesterday's commit, NOT the requested SHA).
/management/info → activeModuleFeatures → missing athena, apollon, ldap (because the running code predates the migration in #12711).
Re-dispatching the same workflow manually via gh workflow run … -f commit_sha=1258484214d47fe42b9a370b1dc860a6aba836fd … worked correctly, confirming the workflow itself is fine — only the SHA Helios passed was wrong.
Why this is a real problem
The deploy reported success even though the wrong code was rolled out. There is no automatic check that the deployed JVM's git.commit.id matches the requested commit_sha, so the only feedback the operator gets is the green checkmark. A more dangerous variant of the same bug would silently deploy a recent-but-not-current build to production.
Concretely on staging1 today, the migration from Spring profiles to module-feature toggles (Artemis #12711) requires a YAML pre-edit to add three *.enabled: true flags; without it, the new WAR refuses to start (getPropertyOrExitArtemis). Because Helios deployed the OLD WAR, the JVM came up fine — and the operator would only notice the regression if they specifically inspected activeModuleFeatures or the embedded git.commit.id. In a different scenario (e.g. a security fix), this could be considerably worse.
Suggested fixes
Possibly more than one of these:
- Fix the source of the stale SHA: investigate whether Helios is caching the workflow_runs / branch HEAD response from the GitHub API (e.g. an in-memory or DB cache that isn't invalidated when a new build completes for that branch). The window between "build green for 12584842" (10:57Z) and "Helios still offers 01be782 as the deploy target" (11:01Z) suggests a cache TTL on the build listing.
- Add a freshness check on dispatch: before dispatching, re-fetch the current HEAD of the target branch and warn / refuse if
commit_sha is older than HEAD by more than N minutes (or simply does not match HEAD when deploying develop).
- Show the actual SHA in the dispatch confirmation modal so the operator can spot a stale value before clicking through. Currently the UI lists builds by other identifiers, not the SHA itself.
- Optional defense-in-depth on the Artemis side: extend
.github/workflows/prod-like-deployment.yml with a post-deploy check that curls /management/info and asserts git.commit.id matches the input commit_sha. Then a stale-SHA deploy would at least fail loudly. (Tracked separately on the Artemis repo if you want — happy to open the PR.)
Environment
- Helios:
helios.aet.cit.tum.de
- Affected Artemis repo:
ls1intum/Artemis
- Affected GitHub environment:
staging1.artemis.cit.tum.de
- Deploy workflow:
.github/workflows/prod-like-deployment.yml
Summary
Helios's CI/CD deploy action dispatched the
prod-like-deployment.ymlworkflow onls1intum/Artemiswith a stalecommit_shainput — the previous develop HEAD, not the current one. The workflow obediently fetched the build artifact for that older commit and deployed it, so the staging1 nodes rolled forward to yesterday's WAR instead of the just-merged PR. The deploy reportedsuccess; the smoke tests caught it because the running JVM'sgit.commit.iddid not match the SHA we asked Helios to deploy.Concrete instance (today, 2026-05-25)
developat10:45:16Z. Merge commit on develop:1258484214d47fe42b9a370b1dc860a6aba836fd.build.ymlworkflow for that commit completedsuccessat10:57:07Zand produced anArtemis.warartifact (run 26396508993).12584842was not visible for several minutes (suspected list caching).11:01:09Zproduced Artemis deploy run 26397117774. From itsPrint inputsstep log:check-build-statusjob resolved this SHA to build run 26371588020 (yesterday's build of01be782), downloaded that artifact, and ansible rolled it to all three staging1 core nodes.systemctl is-active artemis→activeon all 6 hosts./management/health→200 UP./management/info → git.commit.id→01be782943782cb39d644d5c529223e873c8c51e(yesterday's commit, NOT the requested SHA)./management/info → activeModuleFeatures→ missingathena,apollon,ldap(because the running code predates the migration in #12711).Re-dispatching the same workflow manually via
gh workflow run … -f commit_sha=1258484214d47fe42b9a370b1dc860a6aba836fd …worked correctly, confirming the workflow itself is fine — only the SHA Helios passed was wrong.Why this is a real problem
The deploy reported
successeven though the wrong code was rolled out. There is no automatic check that the deployed JVM'sgit.commit.idmatches the requestedcommit_sha, so the only feedback the operator gets is the green checkmark. A more dangerous variant of the same bug would silently deploy a recent-but-not-current build to production.Concretely on
staging1today, the migration from Spring profiles to module-feature toggles (Artemis #12711) requires a YAML pre-edit to add three*.enabled: trueflags; without it, the new WAR refuses to start (getPropertyOrExitArtemis). Because Helios deployed the OLD WAR, the JVM came up fine — and the operator would only notice the regression if they specifically inspectedactiveModuleFeaturesor the embeddedgit.commit.id. In a different scenario (e.g. a security fix), this could be considerably worse.Suggested fixes
Possibly more than one of these:
commit_shais older than HEAD by more than N minutes (or simply does not match HEAD when deployingdevelop)..github/workflows/prod-like-deployment.ymlwith a post-deploy check that curls/management/infoand assertsgit.commit.idmatches the inputcommit_sha. Then a stale-SHA deploy would at least fail loudly. (Tracked separately on the Artemis repo if you want — happy to open the PR.)Environment
helios.aet.cit.tum.dels1intum/Artemisstaging1.artemis.cit.tum.de.github/workflows/prod-like-deployment.yml