promote main by hubsmoke · Pull Request #1284 · desci-labs/nodes

hubsmoke · 2026-04-29T03:50:52Z

promote main

The importer was crash-looping because OpenAlex returned mesh entries with null primary key columns. After dedup filtering removed all rows, an empty array was passed to pgp.helpers.insert() which throws. - Add empty-array guards after dedup in updateWorksMesh, updateWorksConcepts, updateWorksTopics - Make cron error handler resilient (log + retry instead of crash) - Add npm test gate to Dockerfile so broken code can't deploy - Add realistic Work fixtures covering null PKs, empty arrays, missing locations, and other real API edge cases - Add transformer tests (13) and saveData integration tests (8) - Add dedup edge-case tests for all-null-PK scenario (3) Made-with: Cursor

…Data tests Remove dead insertCalls array and unnecessary mock wrapper per CR feedback. The tests use the real pgp.helpers for SQL generation with a mock transaction. Made-with: Cursor

@Botfather

- Daily digest cron (default 9:00 UTC) reports sync position, days imported, duration, failed batches, and days-behind status - Error notifications rate-limited to 1/hour to avoid crash-loop spam - "Caught up" notification fires only on state transition (importing → idle), not on every cron tick - Digest queries the batch table directly so stats survive pod restarts - Gracefully no-ops when TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID are not set (opt-in via SOPS secret) - 19 new tests covering send, rate limiting, dedup, error fallback Env vars to configure: TELEGRAM_BOT_TOKEN — Bot API token from @Botfather TELEGRAM_CHAT_ID — Chat/group ID (e.g. -1002207868111) TELEGRAM_THREAD_ID — Optional topic thread ID DIGEST_SCHEDULE — Cron expression (default: '0 9 * * *')

Uses long-polling (getUpdates) — outbound HTTPS only, no ingress needed. Bot responds to /status with sync position, last 24h stats, pod uptime. Shared buildDigestMessage between daily digest and /status command. Polling stops cleanly on SIGTERM/SIGINT via AbortController.

- Add explicit radix to parseInt(threadId, 10) - Remove duplicate JSDoc on buildDigestMessage - Log error in /status command catch block - Fix cooldown test: use Date.now spy instead of resetModules

…k-crash fix(openalex-importer): guard against empty arrays after dedup filtering

Builds Docker image on push to main/develop, pushes to ECR with sha-timestamp tags for Flux image automation.

…ings vitest 4.x uses rolldown which requires Node ^20.19.0 || >=22.12.0. The previous 20.18.1 caused npm ci to skip the platform-specific optional dependency, failing the test step in Docker builds.

…l|null, not undefined

…ehind Startup message now includes full status digest (sync date, days behind, last successful import timestamp, 24h stats, pod uptime). /status command shows the same enriched data.

… queries - Total works via pg_class.reltuples (instant, no table scan) - 24h works via works_batch + batch join (small table, indexed) - Avoids COUNT(*) on openalex.works which would full-scan millions of rows

- Records section now shows 24h / 30d / total breakdown - Last import shows both timestamp and relative time (e.g. "3 hours ago") - Total works prefixed with ~ to indicate approximate count

…eptions - Unified shutdown handler sends reason to Telegram before exiting - Uncaught exceptions now send error + shutdown notification - SIGTERM shows "K8s rollout or scale-down" so deploys are visible - All Telegram sends use .catch() to never block shutdown - SIGTERM/SIGINT exit with 0 (clean shutdown, not error)

- Add 5s timeout on all shutdown Telegram sends and pool.end() so a hung network call doesn't stall until SIGKILL - Add unhandledRejection handler (Node 15+ exits on these) - Remove dead beforeExit handler (never fires with cron jobs) - Make SIGTERM/SIGINT handlers synchronous, fire-and-forget shutdown via void (avoids fragile async signal handlers)

- Point PUBLIC_IPFS_RESOLVER to test IPFS node instead of pub.desci.com - Test now adds content to test IPFS first, then uses that CID - Eliminates dependency on external IPFS gateway availability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a /pipelines Telegram bot command that queries the Prefect API and export_metadata to show real-time health of all downstream pipelines (ES, novelty, Qdrant). Shows overall verdict, per-pipeline status (healthy/lagging/stalled/failing), batch progress with percentages, and stall warnings. Gracefully degrades if Prefect is unreachable. Made-with: Cursor

…fect logging Made-with: Cursor

feat(openalex-importer): add /pipelines command for downstream health

…id work within 24h Made-with: Cursor

fix(openalex-importer): simplify pipeline health — healthy if ran + d…

…ommits The entire day's import was wrapped in one PG transaction. For days with millions of updated works (e.g. Jan 13), this runs for 10+ hours and if the pod crashes, all progress rolls back and it restarts from scratch. - Each 1000-work chunk now saves in its own transaction - Batch record created/finalized independently outside the stream - getNextDayToImport only considers finalized batches (finished_at IS NOT NULL) - Cleanup of orphaned batch records on crash recovery - Startup message distinguishes crash recovery (🔄) from clean start (🟢) - Daily digest prefixed with 📅 Daily Update - /stopupdate and /startupdate commands to pause/resume daily digest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ilure - Log batchId and queryInfo when the stream pipeline fails, making it easier to trace which batch was left unfinished - Add comment documenting intentional per-day scope of cleanupUnfinishedBatches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eanup safety, batch index - Add TELEGRAM_ADMIN_IDS auth check on /stopupdate and /startupdate commands - Use FOR UPDATE SKIP LOCKED + 5-min staleness guard in cleanupUnfinishedBatches to prevent deleting live batches during rolling restarts - Add composite index on batch(query_type, query_from, query_to, finished_at) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…o syntax, comments - Add migration 0011 for batch_cleanup_idx composite index - Fix pino logger syntax: pass object-first in sendDailyDigest - Add clarifying comment on intentional open-access when TELEGRAM_ADMIN_IDS unset - Document global scope of hasUnfinishedBatches vs day-scoped cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(openalex-importer): break single-day tx into per-chunk commits

Today's ipfs.desci.com outage caused a second-order outage in desci-server: every replica crashed in lockstep on each request that hit the IPFS gateway during the ~3 minute window it was down. Restart counts on prod-desci-server pods hit 3 in 18 minutes. Root cause: 1. controllers/raw/resolve.ts had two unprotected `axios.get` calls to ${ipfsResolver}/${cidString} (the version-by-cid/index branch and the zip-streaming branch). When ipfs.desci.com returned 503, axios threw, the rejection was unhandled in the request handler, and... 2. ...index.ts wired BOTH `uncaughtException` and `unhandledRejection` into the same `handleFatalError` path, which calls cleanup() and process.exit(1). So a single failing upstream HTTP call took down the entire pod. This was a design bug masquerading as a probe failure. A long-running web server must survive a bad request — the offending handler should be fixed, but the process must keep serving traffic. Changes: - index.ts: split unhandledRejection off from handleFatalError into a log-only handler. uncaughtException still exits (correct: the process state is unknown after an uncaught sync exception). Sentry still picks up unhandled rejections via its global integration, so we don't lose visibility. - controllers/raw/resolve.ts: wrap the two unprotected axios.get calls in try/catch and return a 502 to the client when the IPFS uplink fails, matching the existing pattern in the latest-version branch at line 75. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- resolve.ts: replace bare response.data.pipe(res) with stream.pipeline so mid-stream errors (source aborts, client disconnects) are forwarded and both streams are torn down. Add a headersSent guard so we don't attempt to send a JSON 502 after streaming has already begun. - index.ts: add a .catch() to server.ready() that calls handleFatalError. Now that unhandledRejection is non-fatal, a startup failure here would otherwise be silently logged and the process would limp along half-initialized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…o-crash fix(desci-server): don't crash the pod on unhandled promise rejections

Chore: update prisma

Chore: update prisma sentry

Root cause: /v1/pub/versions calls dpid.org/api/v2/query/history which takes 5-8 seconds per request (queries Ceramic for version anchoring). This blocks every dpid page SSR render. Fix: wrap getIndexedResearchObjects in getOrCache with 1-day TTL. - First request: 5-8s (cold, fetches from dpid.org) - Subsequent requests: <100ms (Redis hit) - Cache invalidated on publish (delFromCache in publish controller) Uses existing Redis infrastructure (getOrCache, ONE_DAY_TTL). Trace data: POST dpid.org/api/v2/query/history {ids:["1077"]} = 7.06s TTFB GET /v1/pub/versions/{uuid} = 5.27s TTFB (99% is the above call) All other SSR steps combined = <500ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Queries all published nodes from DB and calls getIndexedResearchObjects for each, populating the indexed-versions-{uuid} cache keys. Usage: npx ts-node src/scripts/warm-versions-cache.ts CONCURRENCY=5 npx ts-node src/scripts/warm-versions-cache.ts Runs with concurrency=3 by default to avoid overwhelming dpid.org. Can be run as a one-time migration after deploy, or scheduled as a daily cron to keep caches warm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Chore: remove legacy desci-server load balancers

perf: cache /v1/pub/versions in Redis (7s → <100ms)

coderabbitai · 2026-04-29T03:50:59Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ba10c7d-a553-4eef-8def-095e69475c25

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch develop

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Same pattern as #1281 (which cached /v1/pub/versions). The dpid SSR path in nodes-web-v2 hits both endpoints on every cold render: resolvePublishedManifest → GET /<uuid>/<version> (handled by raw/resolve.ts) loadDriveTree → GET /v1/data/pubTree/... (handled by data/retrieve.ts) The first is the slow one — it calls getIndexedResearchObjects (theGraph, ~5-8s) plus an IPFS gateway fetch. The second varies but is typically 1-3s. Together they dominate the dpid cold-load TTFB. Both responses are content-addressed: - pubTree by (uuid, manifestCid, rootCid, dataPath, depth) — manifestCid is itself a content hash, so the tuple is fully immutable. No invalidation needed; new publishes mint a new manifestCid and write a fresh cache entry. - resolve by (uuid, firstParam) where firstParam is "" / index / CID. Index- and CID-keyed entries are immutable. The "latest" key is invalidated in publish.ts alongside the existing `indexed-versions` invalidation. Cache safety: only success paths cache. Component (PDF/code) responses in resolve.ts and 4xx/5xx responses are NOT cached. Combined with the Vercel edge cache shipped in nodes-web-v2#1540, cold SSR drops from 12-16s → ~1.5s; warm SSR is unchanged at ~150ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

perf: cache /v1/data/pubTree and manifest resolve in Redis

hubsmoke and others added 30 commits April 4, 2026 05:42

refactor(openalex-importer): clean up unused mock scaffolding in save…

e213d22

…Data tests Remove dead insertCalls array and unnecessary mock wrapper per CR feedback. The tests use the real pgp.helpers for SQL generation with a mock transaction. Made-with: Cursor

fix(openalex-importer): address CodeRabbit review comments

1d0c0c2

- Add explicit radix to parseInt(threadId, 10) - Remove duplicate JSDoc on buildDigestMessage - Log error in /status command catch block - Fix cooldown test: use Date.now spy instead of resetModules

Merge pull request #1272 from desci-labs/fix/openalex-importer-null-p…

3e57795

…k-crash fix(openalex-importer): guard against empty arrays after dedup filtering

ci: add GitHub Actions workflow for openalex-importer

500d2b6

Builds Docker image on push to main/develop, pushes to ECR with sha-timestamp tags for Flux image automation.

fix(openalex-importer): bump Node to 20.19.1 for rolldown native bind…

a7c9948

…ings vitest 4.x uses rolldown which requires Node ^20.19.0 || >=22.12.0. The previous 20.18.1 caused npm ci to skip the platform-specific optional dependency, failing the test step in Docker builds.

fix(openalex-importer): fix TS error — signal type must be AbortSigna…

8a10f2e

…l|null, not undefined

feat(openalex-importer): announce bot online on startup

b518059

feat(openalex-importer): enrich status with last import time & days b…

2d42e51

…ehind Startup message now includes full status digest (sync date, days behind, last successful import timestamp, 24h stats, pod uptime). /status command shows the same enriched data.

feat(openalex-importer): add record counts to status, use lightweight…

88094fe

… queries - Total works via pg_class.reltuples (instant, no table scan) - 24h works via works_batch + batch join (small table, indexed) - Avoids COUNT(*) on openalex.works which would full-scan millions of rows

feat(openalex-importer): add 30d record counts and time-ago to status

d5f812e

- Records section now shows 24h / 30d / total breakdown - Last import shows both timestamp and relative time (e.g. "3 hours ago") - Total works prefixed with ~ to indicate approximate count

fix(openalex-importer): address CodeRabbit review — NaN guard and Pre…

b2f6d9c

…fect logging Made-with: Cursor

Merge pull request #1273 from desci-labs/feat/openalex-pipeline-status

5ce5849

feat(openalex-importer): add /pipelines command for downstream health

fix(openalex-importer): simplify pipeline health — healthy if ran + d…

b0dc6fc

…id work within 24h Made-with: Cursor

Merge pull request #1274 from desci-labs/feat/openalex-pipeline-status

ad8ab5f

fix(openalex-importer): simplify pipeline health — healthy if ran + d…

Merge pull request #1276 from desci-labs/fix/chunked-pipeline-commits

eccbbce

fix(openalex-importer): break single-day tx into per-chunk commits

Merge pull request #1277 from desci-labs/hotfix/unhandled-rejection-n…

9da5e0e

…o-crash fix(desci-server): don't crash the pod on unhandled promise rejections

update prisma

4bbf638

ogbanugot and others added 11 commits April 14, 2026 18:08

remove deprecated import and feature

2b77b17

fix prisma 6 or shape

211d1e6

Merge pull request #1278 from desci-labs/update-prisma

61d4b3c

Chore: update prisma

update prisma sentry

ff219c9

update sentry version

7351243

Merge pull request #1279 from desci-labs/update-prisma

40296dc

Chore: update prisma sentry

Remove legacy desci-server load balancers

3297198

Merge pull request #1282 from desci-labs/remove-legacy-lbs

21306f7

Chore: remove legacy desci-server load balancers

Merge pull request #1281 from desci-labs/perf/cache-pub-versions

3875380

perf: cache /v1/pub/versions in Redis (7s → <100ms)

hubsmoke mentioned this pull request Apr 29, 2026

perf: cache /v1/pub/versions in Redis (cherry-pick #1281 to prod) #1285

Merged

4 tasks

hubsmoke and others added 2 commits May 2, 2026 21:41

Merge pull request #1286 from desci-labs/hotfix/cache-resolve-pubtree

66d5e85

perf: cache /v1/data/pubTree and manifest resolve in Redis

github-actions Bot deployed to dev May 3, 2026 02:05 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

promote main#1284

promote main#1284
hubsmoke wants to merge 43 commits intomainfrom
develop

hubsmoke commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hubsmoke commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading