catalog: redirect to /login on AUTH_REFRESH_FAILED; guard first-login race#4923
catalog: redirect to /login on AUTH_REFRESH_FAILED; guard first-login race#4923drernie wants to merge 2 commits into
Conversation
…first login
Companion to enterprise PR that types the registry's SSO refresh-failure
401 with `AUTH_REFRESH_FAILED` (and the no-Authorization-header 401 with
`NOT_LOGGED_IN`).
The urql auth exchange previously only redirected on a single legacy
message string ("Token invalid."). Now it inspects
`graphQLErrors[].extensions.code` for either of the new typed codes and
also keeps the legacy message matchers for compatibility. On a hit it
dispatches `authLost` — the existing path that clears tokens and routes
to /login — instead of falling through to a generic error banner.
To avoid making the post-OAuth-callback hydration race worse, suppress
NOT_LOGGED_IN when redux is in a SIGNING_IN / REFRESHING state: a GraphQL
query firing before `POST /api/login` settles would otherwise clear
tokens mid-handshake and bounce the user back to /login. AUTH_REFRESH_FAILED
is not race-prone (it implies a token was sent and rejected) so it is
always treated as auth lost.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4923 +/- ##
==========================================
+ Coverage 46.55% 46.56% +0.01%
==========================================
Files 832 832
Lines 34120 34138 +18
Branches 5833 5838 +5
==========================================
+ Hits 15884 15897 +13
- Misses 16237 16241 +4
- Partials 1999 2000 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if (isOnlyNotLoggedIn(r.error) && waitingRef.current) { | ||
| return r | ||
| } |
There was a problem hiding this comment.
Race suppression won't fire for legacy
Not logged in messages
isOnlyNotLoggedIn checks extensions.code === 'NOT_LOGGED_IN', so it returns false when the registry sends the old-format [GraphQL] Not logged in message (without an extension code). In that case, even with waitingRef.current === true, the suppression branch is skipped and authLost is dispatched, bouncing the user to /login mid-handshake. The fix only kicks in for registries that have already deployed the new typed codes from the sibling enterprise PR. If there are staging stacks on the old registry version, the first-login race can still trigger during the rollout window.
There was a problem hiding this comment.
Done in 18b2fbe — isOnlyNotLoggedIn now also returns true for the pre-typed [GraphQL] Not logged in message, so the race suppression applies during the rollout window when a stack is still on the old registry build.
| const err = combinedError([{ message: 'Failed to refresh SSO access token' }]) | ||
| expect(isAuthLostError(err)).toBe(true) | ||
| }) | ||
|
|
||
| it('does not flag unrelated errors', () => { |
There was a problem hiding this comment.
Missing test for legacy
Not logged in message path
LEGACY_AUTH_LOST_MESSAGES contains '[GraphQL] Not logged in' but the spec only explicitly exercises the Token invalid. and Failed to refresh SSO access token legacy paths. Adding a case for the Not logged in message (without an extensions object) would confirm the third entry in the set is also reachable and documents the interaction with isOnlyNotLoggedIn returning false for it.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Done in 18b2fbe — added spec cases for the legacy Not logged in message (no extension) covering both isAuthLostError (still a redirect trigger) and isOnlyNotLoggedIn (suppressed while waiting).
Addresses greptile review on #4923: - isOnlyNotLoggedIn also matches the pre-typed `[GraphQL] Not logged in` message; otherwise older registry builds (during the rollout) would still bounce users mid-handshake. - Adds spec coverage for the legacy message in both isAuthLostError (kept as redirect-trigger) and isOnlyNotLoggedIn (suppressed while waiting). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@drernie can we add tests for this behavior across all SSO providers? |
Summary
AUTH_REFRESH_FAILED,NOT_LOGGED_IN) on GraphQL responses viagraphQLErrors[].extensions.codeand dispatchauthLostso the user is bounced to/logininstead of seeing a generic error banner.Token invalid.,Not logged in,Failed to refresh SSO access token) for compatibility while the registry change rolls out.NOT_LOGGED_INwhile a sign-in handshake is in flight (redux stateSIGNING_IN/REFRESHING) — fixes the post-OAuth-callback hydration race that the KB workarounds didn't fully address.Context
Vir reported a "session goes dead and only a manual sign-in recovers" symptom: catalog held a session whose Okta refresh token had been rejected, every GraphQL call 401'd
Failed to refresh SSO access token, and the catalog rendered a generic error instead of guiding the user back to login.Sibling registry PR: https://github.com/quiltdata/enterprise/pull/1073
Files
app/containers/Auth/urqlExchange.tsx— newisAuthLostError/isOnlyNotLoggedInhelpers, subscribed-towaitingflag via a ref to gate the race-prone code.app/containers/Auth/urqlExchange.spec.ts— unit coverage for both helpers.Test plan
npx vitest run app/containers/Auth/urqlExchange.spec.tspasses (8 tests)tsc --noEmit— no new errors in changed files🤖 Generated with Claude Code
Greptile Summary
This PR broadens auth-error handling in the urql exchange: it detects typed registry error codes (
AUTH_REFRESH_FAILED,NOT_LOGGED_IN) in addition to legacy message strings, dispatchesauthLostto redirect expired sessions to/login, and suppressesNOT_LOGGED_INduringSIGNING_IN/REFRESHINGto prevent false redirects from the post-OAuth-callback race.isAuthLostErrorunifies typed extension codes and legacy message strings into a single predicate;isOnlyNotLoggedInidentifies the specific race-prone error shape so it can be gated bywaitingRef.current.waitingRefpattern (ref updated every render, read synchronously in the async callback) is the correct way to avoid stale-closure issues in this exchange.NOT_LOGGED_INextension code; legacy[GraphQL] Not logged inmessages (no extension) will still dispatchauthLostduring a handshake until the sibling registry PR is fully deployed.Confidence Score: 4/5
Safe to merge; the auth-redirect path is preserved for all production cases and the race guard adds a narrow, well-commented suppression window.
The core logic is correct: the waitingRef pattern avoids stale closures, authLost is still dispatched for every non-suppressed auth error, and existing legacy message matching is preserved. The only gap is that the first-login race fix is silently ineffective against legacy-format registries during the rollout window.
urqlExchange.tsx lines 125-127 — the race-suppression branch — deserves a second look to confirm expected behavior during the registry upgrade window.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[GraphQL response lands in handleResult] --> B{r.error?} B -- No --> Z[return r] B -- Yes --> C{handleInvalidToken?} C -- No --> Z C -- Yes --> D{isAuthLostError?} D -- No --> Z D -- Yes --> E{isOnlyNotLoggedIn?} E -- No --> G[dispatch authLost] E -- Yes --> F{waitingRef.current?} F -- Yes --> H[Suppress - return r] F -- No --> GReviews (1): Last reviewed commit: "catalog: redirect to /login on AUTH_REFR..." | Re-trigger Greptile