Deploy dispatches stale commit_sha after a fresh build (UI cache out of sync)

## Summary

Helios's CI/CD deploy action dispatched the `prod-like-deployment.yml` workflow on `ls1intum/Artemis` with a **stale `commit_sha` input** — the previous develop HEAD, not the current one. The workflow obediently fetched the build artifact for that older commit and deployed it, so the staging1 nodes rolled forward to **yesterday's WAR instead of the just-merged PR**. The deploy reported `success`; the smoke tests caught it because the running JVM's `git.commit.id` did not match the SHA we asked Helios to deploy.

## Concrete instance (today, 2026-05-25)

1. PR [ls1intum/Artemis#12711](https://github.com/ls1intum/Artemis/pull/12711) was merged into `develop` at `10:45:16Z`. Merge commit on develop: `1258484214d47fe42b9a370b1dc860a6aba836fd`.
2. The `build.yml` workflow for that commit completed `success` at `10:57:07Z` and produced an `Artemis.war` artifact ([run 26396508993](https://github.com/ls1intum/Artemis/actions/runs/26396508993)).
3. UI signal: when opening Helios' CI/CD page for the Artemis repo, the new build for `12584842` was **not visible** for several minutes (suspected list caching).
4. Triggering "Deploy to staging1" from Helios at `11:01:09Z` produced [Artemis deploy run 26397117774](https://github.com/ls1intum/Artemis/actions/runs/26397117774). From its `Print inputs` step log:
   ```
   Branch: develop
   Commit SHA: 01be782943782cb39d644d5c529223e873c8c51e   ← previous develop HEAD (PR #12778), NOT 12584842
   Environment: staging1.artemis.cit.tum.de
   Triggered by: krusche
   ```
5. The workflow's `check-build-status` job resolved this SHA to [build run 26371588020](https://github.com/ls1intum/Artemis/actions/runs/26371588020) (yesterday's build of `01be782`), downloaded that artifact, and ansible rolled it to all three staging1 core nodes.
6. After the deploy:
   - `systemctl is-active artemis` → `active` on all 6 hosts.
   - `/management/health` → `200 UP`.
   - `/management/info → git.commit.id` → `01be782943782cb39d644d5c529223e873c8c51e` (yesterday's commit, NOT the requested SHA).
   - `/management/info → activeModuleFeatures` → missing `athena`, `apollon`, `ldap` (because the running code predates the migration in #12711).

Re-dispatching the same workflow manually via `gh workflow run … -f commit_sha=1258484214d47fe42b9a370b1dc860a6aba836fd …` worked correctly, confirming the workflow itself is fine — only the SHA Helios passed was wrong.

## Why this is a real problem

The deploy reported `success` even though the wrong code was rolled out. There is no automatic check that the deployed JVM's `git.commit.id` matches the requested `commit_sha`, so the only feedback the operator gets is the green checkmark. A more dangerous variant of the same bug would silently deploy a recent-but-not-current build to production.

Concretely on `staging1` today, the migration from Spring profiles to module-feature toggles (Artemis #12711) requires a YAML pre-edit to add three `*.enabled: true` flags; without it, the new WAR refuses to start (`getPropertyOrExitArtemis`). Because Helios deployed the OLD WAR, the JVM came up fine — and the operator would only notice the regression if they specifically inspected `activeModuleFeatures` or the embedded `git.commit.id`. In a different scenario (e.g. a security fix), this could be considerably worse.

## Suggested fixes

Possibly more than one of these:

- **Fix the source of the stale SHA**: investigate whether Helios is caching the workflow_runs / branch HEAD response from the GitHub API (e.g. an in-memory or DB cache that isn't invalidated when a new build completes for that branch). The window between "build green for 12584842" (10:57Z) and "Helios still offers 01be782 as the deploy target" (11:01Z) suggests a cache TTL on the build listing.
- **Add a freshness check on dispatch**: before dispatching, re-fetch the current HEAD of the target branch and warn / refuse if `commit_sha` is older than HEAD by more than N minutes (or simply does not match HEAD when deploying `develop`).
- **Show the actual SHA in the dispatch confirmation modal** so the operator can spot a stale value before clicking through. Currently the UI lists builds by other identifiers, not the SHA itself.
- **Optional defense-in-depth on the Artemis side**: extend `.github/workflows/prod-like-deployment.yml` with a post-deploy check that curls `/management/info` and asserts `git.commit.id` matches the input `commit_sha`. Then a stale-SHA deploy would at least fail loudly. (Tracked separately on the Artemis repo if you want — happy to open the PR.)

## Environment

- Helios: `helios.aet.cit.tum.de`
- Affected Artemis repo: `ls1intum/Artemis`
- Affected GitHub environment: `staging1.artemis.cit.tum.de`
- Deploy workflow: `.github/workflows/prod-like-deployment.yml`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy dispatches stale commit_sha after a fresh build (UI cache out of sync) #1048

Summary

Concrete instance (today, 2026-05-25)

Why this is a real problem

Suggested fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Deploy dispatches stale commit_sha after a fresh build (UI cache out of sync) #1048

Description

Summary

Concrete instance (today, 2026-05-25)

Why this is a real problem

Suggested fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions