Conversation
…pleted results and not sessions
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughReferee Recommender flow shifts from CID-based to URL/hash-based. Environment variables updated across environments. Prisma schema gains queryingData and new indexes with migrations. Controllers add DB-backed caching/validation. Service switches to external presigned URL API, URL-hash sessions, and new helpers. SQS handler and message types updated for URL-based events. OpenAPI docs added. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant API as Server API
participant DB as Prisma/DB
participant Svc as RefereeRecommenderService
participant Ext as Presign/Trigger API
rect rgb(245,248,255)
note over User,API: Generate Presigned URL (cache-aware)
User->>API: POST /ai/referee-recommender/presigned-url { fileHash, fileUrl? }
API->>DB: Find ExternalApiUsage where queryingData.path=['fileHash']
alt Cache hit
DB-->>API: cachedResult
API-->>User: { cached:true, resultKey, fileHash }
else Miss
API->>Svc: generatePresignedUploadUrl(fileUrl or hash)
Svc->>Ext: GET ML_REFEREE_GET_PRESIGNED_URL
Ext-->>Svc: { upload_url, download_url, s3_file_name }
Svc-->>API: { presignedUrl, downloadUrl, fileName }
API-->>User: { cached:false, presignedUrl, downloadUrl, fileName, fileHash }
end
end
rect rgb(245,255,245)
note over User,API: Trigger Recommendation (cache-aware)
User->>API: POST /ai/referee-recommender/trigger { fileUrl, fileHash?, ... }
API->>DB: Lookup cached REFEREE_FINDER by queryingData.fileHash
alt Cached result
DB-->>API: cachedResult
API-->>User: { cached:true, resultKey, createdAt }
else Not cached
API->>Svc: triggerRefereeRecommendation({ file_url, ... })
Svc->>Ext: POST ML_REFEREE_TRIGGER_START
Ext-->>Svc: { UploadedFileName, info, ... }
Svc-->>API: response
API-->>User: { ...response, cached: inferred }
end
end
rect rgb(255,250,240)
note over API,DB: Get Results (DB-gated)
User->>API: GET /ai/referee-recommender/results?fileName=...
API->>DB: Find latest completed REFEREE_FINDER by queryingData.fileName for user
alt Found
DB-->>API: usage record
API->>Svc: getRefereeResults(UploadedFileName)
Svc-->>API: results
API-->>User: results
else Not found
API-->>User: 404
end
end
sequenceDiagram
autonumber
participant SQS as SQS Handler
participant DB as Prisma/DB
participant WS as Websocket Hub
participant Svc as RefereeRecommenderService
SQS->>SQS: Parse event { file_url, file_hash?, data{...} }
SQS->>Svc: prepareSessionCacheKey(file_url)
Svc-->>SQS: fileUrlHashed
SQS->>DB: Upsert ExternalApiUsage with queryingData{ fileName, eventType, fileHash? } and data{ metadata... }
SQS->>WS: Emit { eventType, fileUrlHashed, paperTitle?, progress_percent? }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (16)
.env.example (1)
205-209: Additions are good; please reorder keys alphabetically to satisfy dotenv-linter.dotenv-linter warnings indicate ordering issues. Alphabetizing this block will silence warnings and reduce churn in future diffs.
Apply this diff to reorder the Referee Finder keys:
-ML_REFEREE_GET_PRESIGNED_URL= -ML_REFEREE_TRIGGER_START= -ML_REFEREE_FINDER_RESULT= -ML_REFEREE_FILE_PREFIX= +ML_REFEREE_FILE_PREFIX= +ML_REFEREE_FINDER_RESULT= +ML_REFEREE_GET_PRESIGNED_URL= +ML_REFEREE_TRIGGER_START=Optionally add short inline comments for the two new keys to clarify intended values (e.g., FILE_PREFIX expected format and TRIGGER_START endpoint semantics).
desci-server/prisma/migrations/20250818192759_add_gin_index_external_api_querying_data/migration.sql (1)
4-5: If you keep this index, tune the operator class to match query patterns; otherwise, remove churn.
- If you’re primarily doing JSON containment queries (
@>), preferjsonb_path_opsfor a smaller/faster GIN index.- This index is dropped later in the same PR. Consider squashing migrations to avoid creating then immediately dropping a heavy index in one deploy.
Option A (keep, tuned index):
-CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_external_api_usage_querying_data_gin" -ON "ExternalApiUsage" USING GIN ("queryingData"); +CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_external_api_usage_querying_data_gin" +ON "ExternalApiUsage" USING GIN ("queryingData" jsonb_path_ops);Option B (remove churn):
- Drop this migration from the PR and keep only the later “DROP INDEX …” if the index is no longer desired.
desci-server/prisma/schema.prisma (1)
1438-1445: Field addition matches migration; update the comment to reflect current indexing.
queryingData Json?matches the migration. The comment says “GIN indexed,” but this PR later drops that index—update to avoid confusion.Apply this diff:
- queryingData Json? // GIN indexed, important data thats used for querying goes here, avoid dumping random metadata. + queryingData Json? // Important, query-focused data (e.g., keys used in lookups). Avoid dumping arbitrary metadata. Index as needed.desci-server/src/services/sqs/SqsMessageTypes.ts (2)
27-35: Consider versioning the message schema.Add a
schema_version(number) on the top-level message so consumers can branch logic safely as the payload evolves.Example:
export interface ExternalApiSqsMessage extends BaseSqsMessage { messageType: SqsMessageType.EXTERNAL_REFEREE_RECOMMENDER_API; eventType: 'PROCESSING_STARTED' | 'PROCESSING_COMPLETED' | 'PROCESSING_FAILED' | 'PROCESSING_PROGRESS'; + /** Increment when SQS message shape changes in a backward-incompatible way */ + schema_version?: 1; file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security. ... }
37-44: Unify hash field names and clarify timestamp units.
- You have
file_hashtop-level andhash_valueinsidedata. This is easy to misuse. Either mirror the same name or add JSDoc explaining which is which.completed_atshould specify seconds vs milliseconds.export type ExternalApiSqsMessageData = { file_url: string; - hash_value?: string; + /** Duplicate of top-level `file_hash`; prefer `file_hash` */ + hash_value?: string; + /** Preferred explicit file content hash (mirrors top-level) */ + file_hash?: string; - completed_at?: number; // Unix TS + /** Unix timestamp in seconds */ + completed_at?: number; };desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (3)
41-55: Cache check should verify completion, not just existence.
findFirstonexternalApiUsagebyfileHashmay return in-progress or failed runs, yet you immediately return “Results for this file are already available”.At minimum, gate on a completion indicator (e.g., row status,
completedAtcolumn, orqueryingData.completed_at != null).Would you confirm which field denotes completion on
ExternalApiUsage? If it’sstatus = 'COMPLETED'orcompletedAt IS NOT NULL, add it to thewhereclause. If stored under JSON, consider:queryingData: { path: ['completed_at'], not: null as any }I can propose a concrete diff once you confirm the column/JSON shape.
75-77: Prefer 415 (Unsupported Media Type) for non-PDF uploads.The check is right, but 400 is generic. 415 better communicates why the request is rejected.
- return sendError(res, 'Only PDF files are supported', 400); + return sendError(res, 'Only PDF files are supported', 415);
29-31: ValidatefileHashformat (64-hex) to avoid cache misses and collisions.Right now it’s just non-empty. A malformed hash could poison cache lookups.
If you keep a local schema, enforce:
- fileHash: z.string().min(1, 'fileHash is required'), + fileHash: z + .string() + .regex(/^[A-Fa-f0-9]{64}$/, 'fileHash must be a 64-char hex SHA-256'),Otherwise, add this to the centralized schema.
desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)
60-62: Comment drift: not strictly “from lambda API” yet.The comment says “now fetch actual results from lambda API” (true), but the earlier log says “Fetching referee recommendation results” generically. Not an issue, just ensure consistency.
78-86: Log message says “from database” but the fetch is from lambda.Consider changing the message to reflect the actual source of the payload to avoid confusion during debugging.
- 'Successfully fetched referee recommendation results from database', + 'Successfully fetched referee recommendation results from lambda API',desci-server/src/docs/aiServices.ts (1)
20-53: RelaxfileUrlvalidation to support non-HTTP URLs (s3, ipfs).The PR description says “supports any links (s3, cid, external link)”, but
.url()will rejects3://andipfs://. Use.min(1)with a custom refinement or documented acceptable schemes.- fileUrl: z.string().url().describe('URL to the PDF file'), + fileUrl: z + .string() + .min(1) + .describe('URL (http(s), s3, ipfs, etc.) to the PDF file'),desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts (2)
40-52: Add pagination to the cache query for scalabilityThe current implementation uses
findFirstwhich may not be ideal if there are multiple cached results. Consider adding pagination or at least a limit to avoid potential performance issues with large datasets.- cachedResult = await prisma.externalApiUsage.findFirst({ + cachedResult = await prisma.externalApiUsage.findFirst({ where: { userId: user.id, apiType: ExternalApi.REFEREE_FINDER, queryingData: { path: ['fileHash'], equals: fileHash, }, }, orderBy: { createdAt: 'desc', // Get the most recent result }, + take: 1, // Explicitly limit to 1 result });
119-121: Consider making the cache detection logic more robustThe current implementation checks for both "already exists" and "ready to be polled" in the info message. This string-based approach might be fragile if the external API changes its response format.
Consider defining constants for these status messages or using a more structured response format:
+ const CACHE_INDICATORS = { + EXISTS: 'already exists', + READY: 'ready to be polled' + }; + // Check if results are already available based on info message const isResultReady = - responseData.info?.toLowerCase().includes('already exists') && - responseData.info?.toLowerCase().includes('ready to be polled'); + responseData.info?.toLowerCase().includes(CACHE_INDICATORS.EXISTS) && + responseData.info?.toLowerCase().includes(CACHE_INDICATORS.READY);desci-server/src/services/externalApi/RefereeRecommenderService.ts (3)
145-147: Consider caching the environment variable checkThe environment variable check happens on every request. Consider checking this during service initialization to fail fast.
Move the environment variable validation to module initialization:
+// Validate required environment variables at module initialization +if (!ML_REFEREE_GET_PRESIGNED_URL) { + throw new Error('ML_REFEREE_GET_PRESIGNED_URL environment variable not configured'); +} + async function generatePresignedUploadUrl(request: PresignedUrlRequest): Promise<Result<PresignedUrlResponse, Error>> { try { // ... existing code ... - if (!ML_REFEREE_GET_PRESIGNED_URL) { - return err(new Error('ML_REFEREE_GET_PRESIGNED_URL environment variable not configured')); - }
265-267: Optimize Redis calls by using pipeline for multiple hash computationsThe function computes the hash twice when calling
prepareSessionCacheKeyfor the same fileUrl. Consider computing it once and reusing.async function getSessionsByFileUrl(fileUrl: string): Promise<Result<RefereeRecommenderSession[], Error>> { try { const hashedFileUrl = prepareSessionCacheKey(fileUrl); const filenameCacheKey = `referee-recommender-file-url:${hashedFileUrl}`;
572-586: Consider using Redis SET operations for better performanceThe current implementation uses arrays to track active sessions. Consider using Redis SET operations (SADD, SREM, SMEMBERS) for better performance and atomicity.
This would eliminate the need to:
- Parse JSON arrays
- Check for duplicates manually
- Filter arrays in memory
Redis SET operations handle all of this automatically and atomically.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (17)
.env.example(1 hunks)desci-server/kubernetes/deployment_dev.yaml(1 hunks)desci-server/kubernetes/deployment_prod.yaml(1 hunks)desci-server/kubernetes/deployment_staging.yaml(1 hunks)desci-server/prisma/migrations/20250818192731_add_querying_data_external_api_table/migration.sql(1 hunks)desci-server/prisma/migrations/20250818192759_add_gin_index_external_api_querying_data/migration.sql(1 hunks)desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql(1 hunks)desci-server/prisma/schema.prisma(1 hunks)desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts(4 hunks)desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts(5 hunks)desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts(3 hunks)desci-server/src/docs/aiServices.ts(1 hunks)desci-server/src/docs/openai.ts(3 hunks)desci-server/src/schemas/externalApi.schema.ts(1 hunks)desci-server/src/services/externalApi/RefereeRecommenderService.ts(15 hunks)desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts(11 hunks)desci-server/src/services/sqs/SqsMessageTypes.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
desci-server/src/docs/**/*
📄 CodeRabbit inference engine (.cursorrules)
When asked to create API documentation or update API documentation for an API route/controller/service, follow the pattern already established in the desci-server/src/docs folder
Files:
desci-server/src/docs/aiServices.tsdesci-server/src/docs/openai.ts
🧬 Code graph analysis (6)
desci-server/src/docs/openai.ts (1)
desci-server/src/docs/aiServices.ts (1)
aiServicesPaths(326-339)
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (3)
desci-server/src/client.ts (1)
prisma(3-7)desci-server/src/core/api.ts (1)
sendSuccess(5-8)desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
RefereeRecommenderService(702-717)
desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)
desci-server/src/client.ts (1)
prisma(3-7)desci-server/src/core/api.ts (1)
sendError(10-18)
desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts (3)
desci-server/src/client.ts (1)
prisma(3-7)desci-server/src/core/api.ts (1)
sendSuccess(5-8)desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
RefereeRecommenderService(702-717)
desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts (1)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
RefereeRecommenderService(702-717)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (2)
desci-server/src/schemas/externalApi.schema.ts (1)
PresignedUrlResponse(118-118)desci-server/src/redisClient.ts (2)
setToCache(88-96)getFromCache(58-76)
🪛 dotenv-linter (3.3.0)
.env.example
[warning] 207-207: [UnorderedKey] The ML_REFEREE_FINDER_RESULT key should go before the ML_REFEREE_GET_PRESIGNED_URL key
(UnorderedKey)
[warning] 208-208: [UnorderedKey] The ML_REFEREE_FILE_PREFIX key should go before the ML_REFEREE_FINDER_RESULT key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-and-test
🔇 Additional comments (21)
desci-server/kubernetes/deployment_prod.yaml (2)
116-119: Env var rename/addition looks correct and consistent with the PR intent.Replacing ML_REFEREE_TRIGGER_CID with ML_REFEREE_TRIGGER_START and introducing ML_REFEREE_FILE_PREFIX align with the new URL/hash-based flow. Quoting the values is fine for strings.
116-119: Verify Kubernetes env vars and Vault secret keysAll code references to the ML_REFEREE environment variables are correctly aligned across dev, staging, and prod, and there are no lingering references to the old
ML_REFEREE_TRIGGER_CID. Please complete the final step of ensuring the corresponding Vault secret contains the new keys; otherwise the service will start with empty values.• No occurrences of
ML_REFEREE_TRIGGER_CIDfound indesci-server—old key fully removed
•ML_REFEREE_TRIGGER_START,ML_REFEREE_FINDER_RESULT, andML_REFEREE_FILE_PREFIXare present in:
desci-server/kubernetes/deployment_dev.yaml(lines 117–119)desci-server/kubernetes/deployment_staging.yaml(lines 123–125)desci-server/kubernetes/deployment_prod.yaml(lines 116–118)Next step (manual):
• Verify that the Vault pathsecrets/desci-server/production/appincludes these exact keys:
ML_REFEREE_TRIGGER_STARTML_REFEREE_FINDER_RESULTML_REFEREE_FILE_PREFIX.env.example (1)
205-209: .env.example aligns with k8s templates.New variables here match the dev/staging/prod deployment templates, helping local dev stay in sync.
desci-server/kubernetes/deployment_staging.yaml (2)
123-126: Staging env updates match the new API contract.The rename to ML_REFEREE_TRIGGER_START and addition of ML_REFEREE_FILE_PREFIX mirror dev/prod. Good consistency across environments.
123-126: Confirm Vault staging keys exist.Double-check that ML_REFEREE_TRIGGER_START and ML_REFEREE_FILE_PREFIX are defined under secrets/desci-server/staging/app to avoid empty exports at runtime.
Use the same script shared in the prod comment; it checks all three environments together.
desci-server/kubernetes/deployment_dev.yaml (2)
117-120: Dev env changes look good and consistent.The switch to ML_REFEREE_TRIGGER_START plus ML_REFEREE_FILE_PREFIX is correctly applied and quoted.
117-120: Sanity-check local assumptions and remove legacy references.If any local scripts/manifests still reference ML_REFEREE_TRIGGER_CID, they’ll drift from dev. Use the repository-wide scan to ensure everything is updated.
Re-run the script in the prod comment to confirm no stale references remain.
desci-server/prisma/migrations/20250818192731_add_querying_data_external_api_table/migration.sql (2)
2-2: LGTM: Nullable JSONB column addition is safe and fast.Adding a nullable JSONB without a default is metadata-only in Postgres and won’t rewrite the table. Match in schema as
Json?looks correct.
2-2: No null-safety guards needed for queryingDataAfter inspecting all uses of
queryingDatain the codebase, I found that:
- It’s only ever used in Prisma-level JSONB filters (
where: { queryingData: { path: […], equals: … } }), which safely exclude rows where the column isNULLrather than accessing its value directly.- New records always write a non-null JSON object (
queryingData as Prisma.JsonValue), so only legacy rows would haveNULL, and those are already filtered out by the queries.Since there are no code paths that de-reference or read
queryingDatawithout a filter first, no additional COALESCE or null-handling is required at this time.Likely an incorrect or invalid review comment.
desci-server/src/docs/openai.ts (3)
98-98: New “AI Services” tag added—align path naming/versioning.Most existing paths are versioned (
/v1/...), while AI Services use/services/ai/.... If intentional, all good; if not, consider/v1/services/ai/...for consistency.Would you like me to prepare a patch to version these paths and update routers accordingly?
142-142: Merging aiServicesPaths into the OpenAPI document is correct.Assuming each operation defines security and request/response schemas, this will publish the endpoints as expected.
33-33: Docs wiring confirmed; all AI service operations include the['AI Services']tag.Each of the imported operations in
aiServicesPaths(generatePresignedUrlOperation, triggerRefereeRecommendationOperation, getRefereeResultsOperation, getRefereeUsageOperation) already declarestags: ['AI Services']– no further changes needed.
desci-server/prisma/schema.prisma (1)
1449-1452: Good call on practical composite indexes.The added indexes on
[userId],[userId, apiType], and[userId, apiType, createdAt]will cover the common filters and pagination by recency.desci-server/src/services/sqs/SqsMessageTypes.ts (1)
27-35: Good move adding a typed metadata payload.Introducing
ExternalApiSqsMessageDatais a solid step toward safer consumers and clearer contracts.desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (1)
89-105: Nice: expanded response includesdownloadUrlandfileHash.This should help clients correlate uploads and subsequent triggers.
desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)
49-58: Good auth gate with clear 404 when no record exists.The early return keeps the lambda call path clean.
34-47: Ignore DB key alignment suggestion—different keys serve distinct purposes
The pre-check inissuePresignedUrl.tsdeliberately usesfileHashto dedupe identical uploads, whilegetResults.tslooks up by the generatedfileNameto fetch the specific session’s results. The SQS handler’ssaveToDatabasemethod always writes bothfileName(required) and, when available,fileHashintoqueryingData, so:
- The dedupe‐by‐hash check finds any prior usage of the same content.
- The session‐lookup by filename retrieves the exact record created for that upload.
Because these are intentional and complementary, the suggestion to “align on a single key” is not applicable here—please disregard it.
Likely an incorrect or invalid review comment.
desci-server/src/docs/aiServices.ts (2)
239-291: Nice coverage and parameterization for GET results.The
UploadedFileNameparameter is clearly documented and matches the controller’s expectations.
326-339: Good path registration and security.All endpoints are grouped under “AI Services” and protected by bearer auth.
desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts (2)
264-269: LGTM! Good separation of concerns for query optimizationThe separation of queryable data (with GIN index) from metadata is a smart architectural decision that will improve query performance.
173-173: Verified: Server-side handlers updated to useprogress_percent
I’ve searched the repository and confirmed there are no remaining references to the oldprogressfield in the WebSocket event handlers:
- No occurrences of
eventData.data?.progressor object literals using"progress"in thedesci-serverservice code.- The only
progressreference found is indesci-server/src/services/DataMigration/DataMigrationWorker.ts, which is unrelated to WebSocket events.Please manually verify the following before merging:
- Frontend WebSocket consumers (in the client repository) are updated to read
progress_percentinstead ofprogress.- Any tests or documentation still mentioning
progresshave been updated toprogress_percent.
| @@ -0,0 +1,2 @@ | |||
| -- DropIndex | |||
| DROP INDEX "idx_external_api_usage_querying_data_gin"; | |||
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Drop the index CONCURRENTLY (and defensively).
Dropping a large GIN index with DROP INDEX can take an ACCESS EXCLUSIVE lock and block writes. Prefer DROP INDEX CONCURRENTLY IF EXISTS to avoid long blocking in production.
Apply this diff:
-DROP INDEX "idx_external_api_usage_querying_data_gin";
+DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin";Note: CONCURRENTLY cannot run inside a transaction. Ensure this migration is executed without a surrounding transaction.
Verify other migrations with CONCURRENTLY (create/drop) and ensure none are wrapped in transactions:
🏁 Script executed:
#!/bin/bash
rg -nP 'CREATE\s+INDEX\s+CONCURRENTLY|DROP\s+INDEX\s+CONCURRENTLY' desci-server/prisma/migrations -C2Length of output: 830
Drop the GIN index without blocking writes
To avoid an ACCESS EXCLUSIVE lock when removing a large GIN index, switch to DROP INDEX CONCURRENTLY IF EXISTS.
• File (desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql), line 2
-DROP INDEX "idx_external_api_usage_querying_data_gin";
+DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin";• Note: CONCURRENTLY cannot run inside a transaction—ensure this migration is executed in QUERY_RUN_IN_TRANSACTION = false mode or otherwise unwrapped.
I’ve verified that the only other GIN-index migration uses CREATE INDEX CONCURRENTLY IF NOT EXISTS (20250818192759_add_gin_index_external_api_querying_data), and no other drops are currently concurrent.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| DROP INDEX "idx_external_api_usage_querying_data_gin"; | |
| DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin"; |
🤖 Prompt for AI Agents
In
desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql
around line 2, the migration currently does a blocking DROP INDEX; replace it
with a non-blocking DROP INDEX CONCURRENTLY IF EXISTS
"idx_external_api_usage_querying_data_gin" and ensure the migration runs outside
a transaction (set QUERY_RUN_IN_TRANSACTION = false or otherwise unwrap the
migration) because CONCURRENTLY cannot run inside a transaction.
| const generatePresignedUrlSchema = z.object({ | ||
| body: z.object({ | ||
| fileHash: z.string().min(1, 'fileHash is required'), | ||
| fileName: z.string().min(1, 'fileName is required'), | ||
| }), | ||
| }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid duplicating request schemas; centralize and add strict hash validation.
This file introduces its own generatePresignedUrlSchema, which diverges from src/schemas/externalApi.schema.ts. Duplication risks drift with docs and other controllers.
- Reuse the central schema and update it to include
fileHashwith a strict hex pattern (64 chars). - Import the central schema here instead of redefining.
-import { z } from 'zod';
+import { z } from 'zod';
+import { generatePresignedUrlSchema } from '../../../schemas/externalApi.schema.js';
-const generatePresignedUrlSchema = z.object({
- body: z.object({
- fileHash: z.string().min(1, 'fileHash is required'),
- fileName: z.string().min(1, 'fileName is required'),
- }),
-});Companion change in src/schemas/externalApi.schema.ts (see that file’s comment) to add fileHash and the regex.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const generatePresignedUrlSchema = z.object({ | |
| body: z.object({ | |
| fileHash: z.string().min(1, 'fileHash is required'), | |
| fileName: z.string().min(1, 'fileName is required'), | |
| }), | |
| }); | |
| import { z } from 'zod'; | |
| import { generatePresignedUrlSchema } from '../../../schemas/externalApi.schema.js'; |
🤖 Prompt for AI Agents
In
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts
around lines 13-18, a local generatePresignedUrlSchema is being defined which
duplicates schema logic; remove this local definition and import and use the
centralized schema from src/schemas/externalApi.schema.ts instead. Update
src/schemas/externalApi.schema.ts to add the fileHash field under the
appropriate request body schema with a strict hex regex /^[a-fA-F0-9]{64}$/ and
retain the existing fileName validation, export the updated schema, then replace
the local schema usage in this controller with the imported centralized one and
adjust imports/types accordingly.
| return sendSuccess(res, { | ||
| cached: true, | ||
| message: 'Results for this file are already available', | ||
| resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash), | ||
| }); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Return an UploadedFileName (or equivalent) alongside resultKey for GET compatibility.
/services/ai/referee-recommender/results expects UploadedFileName. Here you return only resultKey derived from fileHash. Unless resultKey === UploadedFileName, clients won’t be able to call GET without extra knowledge.
- return sendSuccess(res, {
+ const resultKey = RefereeRecommenderService.prepareFormattedFileName(fileHash);
+ return sendSuccess(res, {
cached: true,
message: 'Results for this file are already available',
- resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash),
+ resultKey,
+ // For compatibility with GET /results
+ UploadedFileName: resultKey,
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return sendSuccess(res, { | |
| cached: true, | |
| message: 'Results for this file are already available', | |
| resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash), | |
| }); | |
| } | |
| const resultKey = RefereeRecommenderService.prepareFormattedFileName(fileHash); | |
| return sendSuccess(res, { | |
| cached: true, | |
| message: 'Results for this file are already available', | |
| resultKey, | |
| // For compatibility with GET /results | |
| UploadedFileName: resultKey, | |
| }); |
🤖 Prompt for AI Agents
In
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts
around lines 66 to 71, the response returns only resultKey (prepared from
fileHash) but the downstream endpoint /services/ai/referee-recommender/results
expects UploadedFileName; add an uploadedFileName property to the success
payload (either use the original uploaded filename from the request/body or
derive/format it the same way the results service expects) so clients can call
the GET endpoint without extra knowledge, and update the response type/interface
to include UploadedFileName.
| const PresignedUrlResponseSchema = z.object({ | ||
| cached: z.boolean().describe('Whether results are cached'), | ||
| presignedUrl: z.string().optional().describe('S3 presigned upload URL'), | ||
| downloadUrl: z.string().optional().describe('S3 download URL'), | ||
| fileName: z.string().optional().describe('Generated S3 filename'), | ||
| fileHash: z.string().optional().describe('File hash'), | ||
| resultKey: z.string().optional().describe('Formatted cache key for results'), | ||
| message: z.string().optional().describe('Cache message'), | ||
| }); | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
200 responses should include the { ok, data, message? } envelope.
Your controllers use sendSuccess, which wraps payloads. RefereeResultsSchema models this, but PresignedUrlResponseSchema (and TriggerRefereeResponseSchema) don’t. Docs will be wrong for clients.
Introduce a reusable success envelope:
const PresignedUrlResponseSchema = z.object({
cached: z.boolean().describe('Whether results are cached'),
presignedUrl: z.string().optional().describe('S3 presigned upload URL'),
downloadUrl: z.string().optional().describe('S3 download URL'),
fileName: z.string().optional().describe('Generated S3 filename'),
fileHash: z.string().optional().describe('File hash'),
resultKey: z.string().optional().describe('Formatted cache key for results'),
message: z.string().optional().describe('Cache message'),
});
+
+const ApiSuccess = <T extends z.ZodTypeAny>(schema: T) =>
+ z.object({
+ ok: z.literal(true),
+ data: schema,
+ message: z.string().optional(),
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const PresignedUrlResponseSchema = z.object({ | |
| cached: z.boolean().describe('Whether results are cached'), | |
| presignedUrl: z.string().optional().describe('S3 presigned upload URL'), | |
| downloadUrl: z.string().optional().describe('S3 download URL'), | |
| fileName: z.string().optional().describe('Generated S3 filename'), | |
| fileHash: z.string().optional().describe('File hash'), | |
| resultKey: z.string().optional().describe('Formatted cache key for results'), | |
| message: z.string().optional().describe('Cache message'), | |
| }); | |
| const PresignedUrlResponseSchema = z.object({ | |
| cached: z.boolean().describe('Whether results are cached'), | |
| presignedUrl: z.string().optional().describe('S3 presigned upload URL'), | |
| downloadUrl: z.string().optional().describe('S3 download URL'), | |
| fileName: z.string().optional().describe('Generated S3 filename'), | |
| fileHash: z.string().optional().describe('File hash'), | |
| resultKey: z.string().optional().describe('Formatted cache key for results'), | |
| message: z.string().optional().describe('Cache message'), | |
| }); | |
| const ApiSuccess = <T extends z.ZodTypeAny>(schema: T) => | |
| z.object({ | |
| ok: z.literal(true), | |
| data: schema, | |
| message: z.string().optional(), | |
| }); |
🤖 Prompt for AI Agents
In desci-server/src/docs/aiServices.ts around lines 10 to 19, the
PresignedUrlResponseSchema models the raw payload but the API always returns the
sendSuccess envelope ({ ok, data, message? }), so update the file to define and
export a reusable success envelope (e.g. SuccessEnvelope = <generic zod wrapper
that enforces ok: true, data: T, message?: string>) and then wrap
PresignedUrlResponseSchema (and TriggerRefereeResponseSchema) with that envelope
(exporting the wrapped schemas for docs). Ensure types and exports are adjusted
so downstream docs/controllers use the enveloped schema.
| export const generatePresignedUrlOperation: ZodOpenApiOperationObject = { | ||
| operationId: 'generatePresignedUrl', | ||
| tags: ['AI Services'], | ||
| summary: 'Generate presigned URL for referee recommender file upload', | ||
| description: | ||
| 'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.', | ||
| security: [{ bearerAuth: [] }], | ||
| requestBody: { | ||
| content: { | ||
| 'application/json': { | ||
| schema: FileHashSchema, | ||
| }, | ||
| }, | ||
| }, | ||
| responses: { | ||
| '200': { | ||
| description: 'Presigned URL generated successfully or cached results found', | ||
| content: { | ||
| 'application/json': { | ||
| schema: PresignedUrlResponseSchema, | ||
| }, | ||
| }, | ||
| }, | ||
| '400': { | ||
| description: 'Invalid request parameters', | ||
| content: { | ||
| 'application/json': { | ||
| schema: ErrorResponseSchema, | ||
| }, | ||
| }, | ||
| }, | ||
| '401': { | ||
| description: 'Authentication required', | ||
| content: { | ||
| 'application/json': { | ||
| schema: ErrorResponseSchema, | ||
| }, | ||
| }, | ||
| }, | ||
| '500': { | ||
| description: 'Internal server error', | ||
| content: { | ||
| 'application/json': { | ||
| schema: ErrorResponseSchema, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Wrap generatePresignedUrl 200-response in the success envelope.
- 'application/json': {
- schema: PresignedUrlResponseSchema,
- },
+ 'application/json': {
+ schema: ApiSuccess(PresignedUrlResponseSchema),
+ },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const generatePresignedUrlOperation: ZodOpenApiOperationObject = { | |
| operationId: 'generatePresignedUrl', | |
| tags: ['AI Services'], | |
| summary: 'Generate presigned URL for referee recommender file upload', | |
| description: | |
| 'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.', | |
| security: [{ bearerAuth: [] }], | |
| requestBody: { | |
| content: { | |
| 'application/json': { | |
| schema: FileHashSchema, | |
| }, | |
| }, | |
| }, | |
| responses: { | |
| '200': { | |
| description: 'Presigned URL generated successfully or cached results found', | |
| content: { | |
| 'application/json': { | |
| schema: PresignedUrlResponseSchema, | |
| }, | |
| }, | |
| }, | |
| '400': { | |
| description: 'Invalid request parameters', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| '401': { | |
| description: 'Authentication required', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| '500': { | |
| description: 'Internal server error', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| }, | |
| }; | |
| export const generatePresignedUrlOperation: ZodOpenApiOperationObject = { | |
| operationId: 'generatePresignedUrl', | |
| tags: ['AI Services'], | |
| summary: 'Generate presigned URL for referee recommender file upload', | |
| description: | |
| 'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.', | |
| security: [{ bearerAuth: [] }], | |
| requestBody: { | |
| content: { | |
| 'application/json': { | |
| schema: FileHashSchema, | |
| }, | |
| }, | |
| }, | |
| responses: { | |
| '200': { | |
| description: 'Presigned URL generated successfully or cached results found', | |
| content: { | |
| 'application/json': { | |
| schema: ApiSuccess(PresignedUrlResponseSchema), | |
| }, | |
| }, | |
| }, | |
| '400': { | |
| description: 'Invalid request parameters', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| '401': { | |
| description: 'Authentication required', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| '500': { | |
| description: 'Internal server error', | |
| content: { | |
| 'application/json': { | |
| schema: ErrorResponseSchema, | |
| }, | |
| }, | |
| }, | |
| }, | |
| }; |
🤖 Prompt for AI Agents
In desci-server/src/docs/aiServices.ts around lines 131 to 179, the 200 response
for generatePresignedUrl currently returns PresignedUrlResponseSchema directly;
wrap that response in the project's success envelope schema (e.g.,
SuccessResponseSchema or the equivalent wrapper that has a "data" field) so the
OpenAPI response shape matches other endpoints. Replace the 200 response content
schema with the success-envelope schema that contains PresignedUrlResponseSchema
as its data payload (or call the helper that composes a success schema around
PresignedUrlResponseSchema), and ensure the description/context remains
accurate.
| function prepareSessionCacheKey(fileUrl: string) { | ||
| // We use the file URLs here instead of the file hash, because it's the only | ||
| // deterministic link that we have available early on for every upload method (S3, CID, External pdf url) | ||
| // for us to be able to retrieve a result back. | ||
| // string -> SHA256 -> Base64 | ||
| return createHash('sha256').update(fileUrl).digest('base64'); // ~44 chars with '=' padding | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider adding input validation for the fileUrl parameter
The function should validate that fileUrl is not empty or undefined before computing the hash.
function prepareSessionCacheKey(fileUrl: string) {
+ if (!fileUrl || typeof fileUrl !== 'string') {
+ throw new Error('Invalid fileUrl provided for session cache key');
+ }
// We use the file URLs here instead of the file hash, because it's the only
// deterministic link that we have available early on for every upload method (S3, CID, External pdf url)
// for us to be able to retrieve a result back.
// string -> SHA256 -> Base64
return createHash('sha256').update(fileUrl).digest('base64'); // ~44 chars with '=' padding
}🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderService.ts around
lines 80 to 86, the prepareSessionCacheKey function computes a SHA256 hash of
fileUrl without validating input; add an early validation that fileUrl is a
non-empty string (e.g. check typeof fileUrl === 'string' and
fileUrl.trim().length > 0), normalize or trim it if needed, and throw a clear
TypeError or return a rejected result when invalid so you never call createHash
with undefined/empty input; then proceed to compute and return the base64 hash
as before.
| function prepareFormattedFileName(fileHash: string) { | ||
| if (!ML_REFEREE_FILE_PREFIX) { | ||
| console.error('ML_REFEREE_FILE_PREFIX is not set'); | ||
| throw new Error('ML_REFEREE_FILE_PREFIX env is not set'); | ||
| } | ||
| return ML_REFEREE_FILE_PREFIX + fileHash; | ||
| } |
There was a problem hiding this comment.
Use proper error handling instead of console.error
Using console.error directly is inconsistent with the logging strategy used throughout the rest of the file.
function prepareFormattedFileName(fileHash: string) {
if (!ML_REFEREE_FILE_PREFIX) {
- console.error('ML_REFEREE_FILE_PREFIX is not set');
+ logger.error('ML_REFEREE_FILE_PREFIX is not set');
throw new Error('ML_REFEREE_FILE_PREFIX env is not set');
}
return ML_REFEREE_FILE_PREFIX + fileHash;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function prepareFormattedFileName(fileHash: string) { | |
| if (!ML_REFEREE_FILE_PREFIX) { | |
| console.error('ML_REFEREE_FILE_PREFIX is not set'); | |
| throw new Error('ML_REFEREE_FILE_PREFIX env is not set'); | |
| } | |
| return ML_REFEREE_FILE_PREFIX + fileHash; | |
| } | |
| function prepareFormattedFileName(fileHash: string) { | |
| if (!ML_REFEREE_FILE_PREFIX) { | |
| logger.error('ML_REFEREE_FILE_PREFIX is not set'); | |
| throw new Error('ML_REFEREE_FILE_PREFIX env is not set'); | |
| } | |
| return ML_REFEREE_FILE_PREFIX + fileHash; | |
| } |
🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderService.ts around
lines 92 to 98, replace the direct console.error call with the service's
standard logger (e.g., logger.error or this.logger.error as used elsewhere in
this file) and log a clear message (optionally include fileHash) before throwing
the Error; do not change the thrown Error text, just use the consistent logger
API to emit the error instead of console.error.
| if (response.data.uploaded_file_name) { | ||
| // Use prepareCacheKey on the fileUrl for consistent hashing | ||
| const hashedFileUrl = prepareSessionCacheKey(request.file_url); | ||
|
|
||
| const session: RefereeRecommenderSession = { | ||
| userId, | ||
| originalFileName: `${request.cid}`, | ||
| hashedFileUrl, | ||
| createdAt: Date.now(), | ||
| }; | ||
|
|
||
| await storeSession(response.data.uploaded_file_name, session); | ||
| await markSessionAsActive(userId, response.data.uploaded_file_name); | ||
| await storeSession(request.file_url, session); | ||
| await markSessionAsActive(userId, request.file_url); | ||
|
|
There was a problem hiding this comment.
Potential race condition when storing sessions
There's a potential race condition between storing the session and marking it as active. If the second operation fails, the session will be stored but not marked as active, leading to an inconsistent state.
Consider using a transaction or combining these operations:
if (response.data.uploaded_file_name) {
// Use prepareCacheKey on the fileUrl for consistent hashing
const hashedFileUrl = prepareSessionCacheKey(request.file_url);
const session: RefereeRecommenderSession = {
userId,
hashedFileUrl,
createdAt: Date.now(),
};
- await storeSession(request.file_url, session);
- await markSessionAsActive(userId, request.file_url);
+ const storeResult = await storeSession(request.file_url, session);
+ if (storeResult.isOk()) {
+ await markSessionAsActive(userId, request.file_url);
+ } else {
+ logger.error({ error: storeResult.error }, 'Failed to store session');
+ // Consider whether to continue or fail the request
+ }Committable suggestion skipped: line range outside the PR's diff.
| '[SQS] Processing referee recommender event', | ||
| ); | ||
|
|
||
| // debugger; |
There was a problem hiding this comment.
Remove debugger statement
The debugger statement should be removed before merging to production.
- // debugger;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // debugger; |
🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts around
line 83, there is a leftover debugger; statement that should be removed before
merging to production; delete the debugger; line and ensure no other debugger
calls remain in the file, then run the linter/tests to confirm no unused or
stray debugging statements remain.
| file_hash?: string; // SHA-256 hash of the file content | ||
| file_url: string; // Source URL | ||
| data?: ExternalApiSqsMessageData | any; | ||
| fileUrlHashed: string; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make fileUrlHashed optional and avoid any in data (align naming).
- Requiring
fileUrlHashedwill break existing producers until all are updated. Make it optional. - Avoid
anyto preserve type-safety; preferRecord<string, unknown>or keep it strictlyExternalApiSqsMessageData. - Mixed casing across fields (
file_url,file_hash,fileUrlHashed) is brittle. Either stick to snake_case for wire payloads or keep both keys during transition.
Apply:
file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security.
- file_hash?: string; // SHA-256 hash of the file content
- file_url: string; // Source URL
- data?: ExternalApiSqsMessageData | any;
- fileUrlHashed: string;
+ /** SHA-256 hash of the file content */
+ file_hash?: string;
+ /** Source URL */
+ file_url: string;
+ /** Structured metadata payload */
+ data?: ExternalApiSqsMessageData | Record<string, unknown>;
+ /** @deprecated Prefer snake_case wire key `file_url_hashed` */
+ fileUrlHashed?: string;
+ /** SHA-256 hash of the normalized file URL */
+ file_url_hashed?: string;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| file_hash?: string; // SHA-256 hash of the file content | |
| file_url: string; // Source URL | |
| data?: ExternalApiSqsMessageData | any; | |
| fileUrlHashed: string; | |
| file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security. | |
| /** SHA-256 hash of the file content */ | |
| file_hash?: string; | |
| /** Source URL */ | |
| file_url: string; | |
| /** Structured metadata payload */ | |
| data?: ExternalApiSqsMessageData | Record<string, unknown>; | |
| /** @deprecated Prefer snake_case wire key `file_url_hashed` */ | |
| fileUrlHashed?: string; | |
| /** SHA-256 hash of the normalized file URL */ | |
| file_url_hashed?: string; |
🤖 Prompt for AI Agents
In desci-server/src/services/sqs/SqsMessageTypes.ts around lines 31–34, make
fileUrlHashed optional (fileUrlHashed?: string) so existing producers don't
break, replace the loose `any` in `data` with a safer union type such as
`ExternalApiSqsMessageData | Record<string, unknown>` to preserve type-safety,
and normalize field naming for the wire format by either using snake_case (e.g.,
file_url, file_hash, file_url_hashed) or explicitly keeping both variants during
a transition (document and populate both keys) so consumers/producers remain
compatible.
Updated to support the lambda API changes, supports any links now (s3, cid, external link)
Updated proxy API endpoints
Summary by CodeRabbit