Skip to content

Ref API changes#1070

Draft
kadamidev wants to merge 11 commits intodevelopfrom
ref-api-changes
Draft

Ref API changes#1070
kadamidev wants to merge 11 commits intodevelopfrom
ref-api-changes

Conversation

@kadamidev
Copy link
Copy Markdown
Contributor

@kadamidev kadamidev commented Aug 22, 2025

Updated to support the lambda API changes, supports any links now (s3, cid, external link)
Updated proxy API endpoints

Summary by CodeRabbit

  • New Features
    • Referee Recommender now supports caching by fileHash and returns downloadUrl alongside presignedUrl.
    • Endpoints accept fileUrl (required) and optional fileHash; responses include a cached flag and enhanced result metadata.
  • Refactor
    • Migrated from CID-based to URL/hash-based workflow and session handling.
    • Added queryable JSON field and new indexes for faster usage lookups.
  • Documentation
    • OpenAPI docs updated with AI Services endpoints and detailed schemas.
  • Chores
    • Environment variables updated across environments to align with new trigger and file prefix settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 22, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Referee Recommender flow shifts from CID-based to URL/hash-based. Environment variables updated across environments. Prisma schema gains queryingData and new indexes with migrations. Controllers add DB-backed caching/validation. Service switches to external presigned URL API, URL-hash sessions, and new helpers. SQS handler and message types updated for URL-based events. OpenAPI docs added.

Changes

Cohort / File(s) Summary of Changes
Env vars: Referee Recommender
\.env.example, desci-server/kubernetes/deployment_*.yaml
Replace ML_REFEREE_TRIGGER_CID with ML_REFEREE_TRIGGER_START; add ML_REFEREE_FILE_PREFIX; propagate to dev/staging/prod deployment templates.
Prisma schema & indexes
desci-server/prisma/schema.prisma
Add ExternalApiUsage.queryingData Json?; add indexes @@index([userId]), @@index([userId, apiType]), @@index([userId, apiType, createdAt]); comments updated.
DB migrations
desci-server/prisma/migrations/20250818192731_*/migration.sql, .../20250818192759_*/migration.sql, .../20250818194118_*/migration.sql
Add queryingData column; add then drop GIN index on queryingData (idx_external_api_usage_querying_data_gin).
Request schema
desci-server/src/schemas/externalApi.schema.ts
Update TriggerReferee schema: replace cid/external with fileUrl (req’d) and fileHash (opt); add classify, coi_filter{...}, meta_data_only, exclude_* fields.
Controllers: Referee Recommender
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts, .../triggerRecommendation.ts, .../getResults.ts
Add Prisma-backed cache lookups; require/propagate fileHash; map new request/response fields; early returns on cache hits; DB validation before results fetch; enhanced logging; minor signature type fix (Response).
Service: Referee Recommender
desci-server/src/services/externalApi/RefereeRecommenderService.ts
Replace S3 presign with external API (ML_REFEREE_GET_PRESIGNED_URL); rename trigger env to ML_REFEREE_TRIGGER_START; introduce URL-hash session keys; add prepareFormattedFileName, prepareSessionCacheKey; update public interfaces and method names to fileUrl/hash semantics; return downloadUrl.
SQS handling & message types
desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts, desci-server/src/services/sqs/SqsMessageTypes.ts
Split queryingData vs metadata in ExternalApiUsage; query by queryingData.fileName; compute/emit fileUrlHashed; include paperTitle; rename progress to progress_percent; add typed ExternalApiSqsMessageData; extend ExternalApiSqsMessage with file_url, file_hash, fileUrlHashed.
API docs
desci-server/src/docs/aiServices.ts, desci-server/src/docs/openai.ts
Add OpenAPI operations and schemas for AI services (presign, trigger, results, usage); register AI Services tag; merge paths into documentation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant API as Server API
  participant DB as Prisma/DB
  participant Svc as RefereeRecommenderService
  participant Ext as Presign/Trigger API

  rect rgb(245,248,255)
  note over User,API: Generate Presigned URL (cache-aware)
  User->>API: POST /ai/referee-recommender/presigned-url { fileHash, fileUrl? }
  API->>DB: Find ExternalApiUsage where queryingData.path=['fileHash']
  alt Cache hit
    DB-->>API: cachedResult
    API-->>User: { cached:true, resultKey, fileHash }
  else Miss
    API->>Svc: generatePresignedUploadUrl(fileUrl or hash)
    Svc->>Ext: GET ML_REFEREE_GET_PRESIGNED_URL
    Ext-->>Svc: { upload_url, download_url, s3_file_name }
    Svc-->>API: { presignedUrl, downloadUrl, fileName }
    API-->>User: { cached:false, presignedUrl, downloadUrl, fileName, fileHash }
  end
  end

  rect rgb(245,255,245)
  note over User,API: Trigger Recommendation (cache-aware)
  User->>API: POST /ai/referee-recommender/trigger { fileUrl, fileHash?, ... }
  API->>DB: Lookup cached REFEREE_FINDER by queryingData.fileHash
  alt Cached result
    DB-->>API: cachedResult
    API-->>User: { cached:true, resultKey, createdAt }
  else Not cached
    API->>Svc: triggerRefereeRecommendation({ file_url, ... })
    Svc->>Ext: POST ML_REFEREE_TRIGGER_START
    Ext-->>Svc: { UploadedFileName, info, ... }
    Svc-->>API: response
    API-->>User: { ...response, cached: inferred }
  end
  end

  rect rgb(255,250,240)
  note over API,DB: Get Results (DB-gated)
  User->>API: GET /ai/referee-recommender/results?fileName=...
  API->>DB: Find latest completed REFEREE_FINDER by queryingData.fileName for user
  alt Found
    DB-->>API: usage record
    API->>Svc: getRefereeResults(UploadedFileName)
    Svc-->>API: results
    API-->>User: results
  else Not found
    API-->>User: 404
  end
  end
Loading
sequenceDiagram
  autonumber
  participant SQS as SQS Handler
  participant DB as Prisma/DB
  participant WS as Websocket Hub
  participant Svc as RefereeRecommenderService

  SQS->>SQS: Parse event { file_url, file_hash?, data{...} }
  SQS->>Svc: prepareSessionCacheKey(file_url)
  Svc-->>SQS: fileUrlHashed
  SQS->>DB: Upsert ExternalApiUsage with queryingData{ fileName, eventType, fileHash? } and data{ metadata... }
  SQS->>WS: Emit { eventType, fileUrlHashed, paperTitle?, progress_percent? }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A hop, a hash, a whisk of URL thread,
I stash my cache where presigned trails are led.
Drums thump: SQS hums, results take flight—
Carrots of data crunch just right.
With paws on logs and ears alert,
I bound through queries, quick and pert. 🥕✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ref-api-changes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kadamidev kadamidev marked this pull request as draft August 22, 2025 11:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (16)
.env.example (1)

205-209: Additions are good; please reorder keys alphabetically to satisfy dotenv-linter.

dotenv-linter warnings indicate ordering issues. Alphabetizing this block will silence warnings and reduce churn in future diffs.

Apply this diff to reorder the Referee Finder keys:

-ML_REFEREE_GET_PRESIGNED_URL=
-ML_REFEREE_TRIGGER_START=
-ML_REFEREE_FINDER_RESULT=
-ML_REFEREE_FILE_PREFIX=
+ML_REFEREE_FILE_PREFIX=
+ML_REFEREE_FINDER_RESULT=
+ML_REFEREE_GET_PRESIGNED_URL=
+ML_REFEREE_TRIGGER_START=

Optionally add short inline comments for the two new keys to clarify intended values (e.g., FILE_PREFIX expected format and TRIGGER_START endpoint semantics).

desci-server/prisma/migrations/20250818192759_add_gin_index_external_api_querying_data/migration.sql (1)

4-5: If you keep this index, tune the operator class to match query patterns; otherwise, remove churn.

  • If you’re primarily doing JSON containment queries (@>), prefer jsonb_path_ops for a smaller/faster GIN index.
  • This index is dropped later in the same PR. Consider squashing migrations to avoid creating then immediately dropping a heavy index in one deploy.

Option A (keep, tuned index):

-CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_external_api_usage_querying_data_gin" 
-ON "ExternalApiUsage" USING GIN ("queryingData");
+CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_external_api_usage_querying_data_gin"
+ON "ExternalApiUsage" USING GIN ("queryingData" jsonb_path_ops);

Option B (remove churn):

  • Drop this migration from the PR and keep only the later “DROP INDEX …” if the index is no longer desired.
desci-server/prisma/schema.prisma (1)

1438-1445: Field addition matches migration; update the comment to reflect current indexing.

queryingData Json? matches the migration. The comment says “GIN indexed,” but this PR later drops that index—update to avoid confusion.

Apply this diff:

-  queryingData Json? // GIN indexed, important data thats used for querying goes here, avoid dumping random metadata.
+  queryingData Json? // Important, query-focused data (e.g., keys used in lookups). Avoid dumping arbitrary metadata. Index as needed.
desci-server/src/services/sqs/SqsMessageTypes.ts (2)

27-35: Consider versioning the message schema.

Add a schema_version (number) on the top-level message so consumers can branch logic safely as the payload evolves.

Example:

 export interface ExternalApiSqsMessage extends BaseSqsMessage {
   messageType: SqsMessageType.EXTERNAL_REFEREE_RECOMMENDER_API;
   eventType: 'PROCESSING_STARTED' | 'PROCESSING_COMPLETED' | 'PROCESSING_FAILED' | 'PROCESSING_PROGRESS';
+  /** Increment when SQS message shape changes in a backward-incompatible way */
+  schema_version?: 1;
   file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security.
   ...
 }

37-44: Unify hash field names and clarify timestamp units.

  • You have file_hash top-level and hash_value inside data. This is easy to misuse. Either mirror the same name or add JSDoc explaining which is which.
  • completed_at should specify seconds vs milliseconds.
 export type ExternalApiSqsMessageData = {
   file_url: string;
-  hash_value?: string;
+  /** Duplicate of top-level `file_hash`; prefer `file_hash` */
+  hash_value?: string;
+  /** Preferred explicit file content hash (mirrors top-level) */
+  file_hash?: string;
-  completed_at?: number; // Unix TS
+  /** Unix timestamp in seconds */
+  completed_at?: number;
 };
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (3)

41-55: Cache check should verify completion, not just existence.

findFirst on externalApiUsage by fileHash may return in-progress or failed runs, yet you immediately return “Results for this file are already available”.

At minimum, gate on a completion indicator (e.g., row status, completedAt column, or queryingData.completed_at != null).

Would you confirm which field denotes completion on ExternalApiUsage? If it’s status = 'COMPLETED' or completedAt IS NOT NULL, add it to the where clause. If stored under JSON, consider:

queryingData: { path: ['completed_at'], not: null as any }

I can propose a concrete diff once you confirm the column/JSON shape.


75-77: Prefer 415 (Unsupported Media Type) for non-PDF uploads.

The check is right, but 400 is generic. 415 better communicates why the request is rejected.

-  return sendError(res, 'Only PDF files are supported', 400);
+  return sendError(res, 'Only PDF files are supported', 415);

29-31: Validate fileHash format (64-hex) to avoid cache misses and collisions.

Right now it’s just non-empty. A malformed hash could poison cache lookups.

If you keep a local schema, enforce:

- fileHash: z.string().min(1, 'fileHash is required'),
+ fileHash: z
+  .string()
+  .regex(/^[A-Fa-f0-9]{64}$/, 'fileHash must be a 64-char hex SHA-256'),

Otherwise, add this to the centralized schema.

desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)

60-62: Comment drift: not strictly “from lambda API” yet.

The comment says “now fetch actual results from lambda API” (true), but the earlier log says “Fetching referee recommendation results” generically. Not an issue, just ensure consistency.


78-86: Log message says “from database” but the fetch is from lambda.

Consider changing the message to reflect the actual source of the payload to avoid confusion during debugging.

-  'Successfully fetched referee recommendation results from database',
+  'Successfully fetched referee recommendation results from lambda API',
desci-server/src/docs/aiServices.ts (1)

20-53: Relax fileUrl validation to support non-HTTP URLs (s3, ipfs).

The PR description says “supports any links (s3, cid, external link)”, but .url() will reject s3:// and ipfs://. Use .min(1) with a custom refinement or documented acceptable schemes.

-  fileUrl: z.string().url().describe('URL to the PDF file'),
+  fileUrl: z
+    .string()
+    .min(1)
+    .describe('URL (http(s), s3, ipfs, etc.) to the PDF file'),
desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts (2)

40-52: Add pagination to the cache query for scalability

The current implementation uses findFirst which may not be ideal if there are multiple cached results. Consider adding pagination or at least a limit to avoid potential performance issues with large datasets.

-      cachedResult = await prisma.externalApiUsage.findFirst({
+      cachedResult = await prisma.externalApiUsage.findFirst({
         where: {
           userId: user.id,
           apiType: ExternalApi.REFEREE_FINDER,
           queryingData: {
             path: ['fileHash'],
             equals: fileHash,
           },
         },
         orderBy: {
           createdAt: 'desc', // Get the most recent result
         },
+        take: 1, // Explicitly limit to 1 result
       });

119-121: Consider making the cache detection logic more robust

The current implementation checks for both "already exists" and "ready to be polled" in the info message. This string-based approach might be fragile if the external API changes its response format.

Consider defining constants for these status messages or using a more structured response format:

+    const CACHE_INDICATORS = {
+      EXISTS: 'already exists',
+      READY: 'ready to be polled'
+    };
+
     // Check if results are already available based on info message
     const isResultReady =
-      responseData.info?.toLowerCase().includes('already exists') &&
-      responseData.info?.toLowerCase().includes('ready to be polled');
+      responseData.info?.toLowerCase().includes(CACHE_INDICATORS.EXISTS) &&
+      responseData.info?.toLowerCase().includes(CACHE_INDICATORS.READY);
desci-server/src/services/externalApi/RefereeRecommenderService.ts (3)

145-147: Consider caching the environment variable check

The environment variable check happens on every request. Consider checking this during service initialization to fail fast.

Move the environment variable validation to module initialization:

+// Validate required environment variables at module initialization
+if (!ML_REFEREE_GET_PRESIGNED_URL) {
+  throw new Error('ML_REFEREE_GET_PRESIGNED_URL environment variable not configured');
+}
+
 async function generatePresignedUploadUrl(request: PresignedUrlRequest): Promise<Result<PresignedUrlResponse, Error>> {
   try {
     // ... existing code ...
-    if (!ML_REFEREE_GET_PRESIGNED_URL) {
-      return err(new Error('ML_REFEREE_GET_PRESIGNED_URL environment variable not configured'));
-    }

265-267: Optimize Redis calls by using pipeline for multiple hash computations

The function computes the hash twice when calling prepareSessionCacheKey for the same fileUrl. Consider computing it once and reusing.

 async function getSessionsByFileUrl(fileUrl: string): Promise<Result<RefereeRecommenderSession[], Error>> {
   try {
     const hashedFileUrl = prepareSessionCacheKey(fileUrl);
     const filenameCacheKey = `referee-recommender-file-url:${hashedFileUrl}`;

572-586: Consider using Redis SET operations for better performance

The current implementation uses arrays to track active sessions. Consider using Redis SET operations (SADD, SREM, SMEMBERS) for better performance and atomicity.

This would eliminate the need to:

  1. Parse JSON arrays
  2. Check for duplicates manually
  3. Filter arrays in memory

Redis SET operations handle all of this automatically and atomically.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 31fc268 and 687ef41.

📒 Files selected for processing (17)
  • .env.example (1 hunks)
  • desci-server/kubernetes/deployment_dev.yaml (1 hunks)
  • desci-server/kubernetes/deployment_prod.yaml (1 hunks)
  • desci-server/kubernetes/deployment_staging.yaml (1 hunks)
  • desci-server/prisma/migrations/20250818192731_add_querying_data_external_api_table/migration.sql (1 hunks)
  • desci-server/prisma/migrations/20250818192759_add_gin_index_external_api_querying_data/migration.sql (1 hunks)
  • desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql (1 hunks)
  • desci-server/prisma/schema.prisma (1 hunks)
  • desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (4 hunks)
  • desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (5 hunks)
  • desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts (3 hunks)
  • desci-server/src/docs/aiServices.ts (1 hunks)
  • desci-server/src/docs/openai.ts (3 hunks)
  • desci-server/src/schemas/externalApi.schema.ts (1 hunks)
  • desci-server/src/services/externalApi/RefereeRecommenderService.ts (15 hunks)
  • desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts (11 hunks)
  • desci-server/src/services/sqs/SqsMessageTypes.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
desci-server/src/docs/**/*

📄 CodeRabbit inference engine (.cursorrules)

When asked to create API documentation or update API documentation for an API route/controller/service, follow the pattern already established in the desci-server/src/docs folder

Files:

  • desci-server/src/docs/aiServices.ts
  • desci-server/src/docs/openai.ts
🧬 Code graph analysis (6)
desci-server/src/docs/openai.ts (1)
desci-server/src/docs/aiServices.ts (1)
  • aiServicesPaths (326-339)
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (3)
desci-server/src/client.ts (1)
  • prisma (3-7)
desci-server/src/core/api.ts (1)
  • sendSuccess (5-8)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
  • RefereeRecommenderService (702-717)
desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)
desci-server/src/client.ts (1)
  • prisma (3-7)
desci-server/src/core/api.ts (1)
  • sendError (10-18)
desci-server/src/controllers/externalApi/RefereeRecommender/triggerRecommendation.ts (3)
desci-server/src/client.ts (1)
  • prisma (3-7)
desci-server/src/core/api.ts (1)
  • sendSuccess (5-8)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
  • RefereeRecommenderService (702-717)
desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts (1)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (1)
  • RefereeRecommenderService (702-717)
desci-server/src/services/externalApi/RefereeRecommenderService.ts (2)
desci-server/src/schemas/externalApi.schema.ts (1)
  • PresignedUrlResponse (118-118)
desci-server/src/redisClient.ts (2)
  • setToCache (88-96)
  • getFromCache (58-76)
🪛 dotenv-linter (3.3.0)
.env.example

[warning] 207-207: [UnorderedKey] The ML_REFEREE_FINDER_RESULT key should go before the ML_REFEREE_GET_PRESIGNED_URL key

(UnorderedKey)


[warning] 208-208: [UnorderedKey] The ML_REFEREE_FILE_PREFIX key should go before the ML_REFEREE_FINDER_RESULT key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-and-test
🔇 Additional comments (21)
desci-server/kubernetes/deployment_prod.yaml (2)

116-119: Env var rename/addition looks correct and consistent with the PR intent.

Replacing ML_REFEREE_TRIGGER_CID with ML_REFEREE_TRIGGER_START and introducing ML_REFEREE_FILE_PREFIX align with the new URL/hash-based flow. Quoting the values is fine for strings.


116-119: Verify Kubernetes env vars and Vault secret keys

All code references to the ML_REFEREE environment variables are correctly aligned across dev, staging, and prod, and there are no lingering references to the old ML_REFEREE_TRIGGER_CID. Please complete the final step of ensuring the corresponding Vault secret contains the new keys; otherwise the service will start with empty values.

• No occurrences of ML_REFEREE_TRIGGER_CID found in desci-server—old key fully removed
ML_REFEREE_TRIGGER_START, ML_REFEREE_FINDER_RESULT, and ML_REFEREE_FILE_PREFIX are present in:

  • desci-server/kubernetes/deployment_dev.yaml (lines 117–119)
  • desci-server/kubernetes/deployment_staging.yaml (lines 123–125)
  • desci-server/kubernetes/deployment_prod.yaml (lines 116–118)

Next step (manual):
• Verify that the Vault path secrets/desci-server/production/app includes these exact keys:

  • ML_REFEREE_TRIGGER_START
  • ML_REFEREE_FINDER_RESULT
  • ML_REFEREE_FILE_PREFIX
.env.example (1)

205-209: .env.example aligns with k8s templates.

New variables here match the dev/staging/prod deployment templates, helping local dev stay in sync.

desci-server/kubernetes/deployment_staging.yaml (2)

123-126: Staging env updates match the new API contract.

The rename to ML_REFEREE_TRIGGER_START and addition of ML_REFEREE_FILE_PREFIX mirror dev/prod. Good consistency across environments.


123-126: Confirm Vault staging keys exist.

Double-check that ML_REFEREE_TRIGGER_START and ML_REFEREE_FILE_PREFIX are defined under secrets/desci-server/staging/app to avoid empty exports at runtime.

Use the same script shared in the prod comment; it checks all three environments together.

desci-server/kubernetes/deployment_dev.yaml (2)

117-120: Dev env changes look good and consistent.

The switch to ML_REFEREE_TRIGGER_START plus ML_REFEREE_FILE_PREFIX is correctly applied and quoted.


117-120: Sanity-check local assumptions and remove legacy references.

If any local scripts/manifests still reference ML_REFEREE_TRIGGER_CID, they’ll drift from dev. Use the repository-wide scan to ensure everything is updated.

Re-run the script in the prod comment to confirm no stale references remain.

desci-server/prisma/migrations/20250818192731_add_querying_data_external_api_table/migration.sql (2)

2-2: LGTM: Nullable JSONB column addition is safe and fast.

Adding a nullable JSONB without a default is metadata-only in Postgres and won’t rewrite the table. Match in schema as Json? looks correct.


2-2: No null-safety guards needed for queryingData

After inspecting all uses of queryingData in the codebase, I found that:

  • It’s only ever used in Prisma-level JSONB filters (where: { queryingData: { path: […], equals: … } }), which safely exclude rows where the column is NULL rather than accessing its value directly.
  • New records always write a non-null JSON object (queryingData as Prisma.JsonValue), so only legacy rows would have NULL, and those are already filtered out by the queries.

Since there are no code paths that de-reference or read queryingData without a filter first, no additional COALESCE or null-handling is required at this time.

Likely an incorrect or invalid review comment.

desci-server/src/docs/openai.ts (3)

98-98: New “AI Services” tag added—align path naming/versioning.

Most existing paths are versioned (/v1/...), while AI Services use /services/ai/.... If intentional, all good; if not, consider /v1/services/ai/... for consistency.

Would you like me to prepare a patch to version these paths and update routers accordingly?


142-142: Merging aiServicesPaths into the OpenAPI document is correct.

Assuming each operation defines security and request/response schemas, this will publish the endpoints as expected.


33-33: Docs wiring confirmed; all AI service operations include the ['AI Services'] tag.

Each of the imported operations in aiServicesPaths (generatePresignedUrlOperation, triggerRefereeRecommendationOperation, getRefereeResultsOperation, getRefereeUsageOperation) already declares

tags: ['AI Services']

– no further changes needed.

desci-server/prisma/schema.prisma (1)

1449-1452: Good call on practical composite indexes.

The added indexes on [userId], [userId, apiType], and [userId, apiType, createdAt] will cover the common filters and pagination by recency.

desci-server/src/services/sqs/SqsMessageTypes.ts (1)

27-35: Good move adding a typed metadata payload.

Introducing ExternalApiSqsMessageData is a solid step toward safer consumers and clearer contracts.

desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts (1)

89-105: Nice: expanded response includes downloadUrl and fileHash.

This should help clients correlate uploads and subsequent triggers.

desci-server/src/controllers/externalApi/RefereeRecommender/getResults.ts (2)

49-58: Good auth gate with clear 404 when no record exists.

The early return keeps the lambda call path clean.


34-47: Ignore DB key alignment suggestion—different keys serve distinct purposes
The pre-check in issuePresignedUrl.ts deliberately uses fileHash to dedupe identical uploads, while getResults.ts looks up by the generated fileName to fetch the specific session’s results. The SQS handler’s saveToDatabase method always writes both fileName (required) and, when available, fileHash into queryingData, so:

  • The dedupe‐by‐hash check finds any prior usage of the same content.
  • The session‐lookup by filename retrieves the exact record created for that upload.

Because these are intentional and complementary, the suggestion to “align on a single key” is not applicable here—please disregard it.

Likely an incorrect or invalid review comment.

desci-server/src/docs/aiServices.ts (2)

239-291: Nice coverage and parameterization for GET results.

The UploadedFileName parameter is clearly documented and matches the controller’s expectations.


326-339: Good path registration and security.

All endpoints are grouped under “AI Services” and protected by bearer auth.

desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts (2)

264-269: LGTM! Good separation of concerns for query optimization

The separation of queryable data (with GIN index) from metadata is a smart architectural decision that will improve query performance.


173-173: Verified: Server-side handlers updated to use progress_percent
I’ve searched the repository and confirmed there are no remaining references to the old progress field in the WebSocket event handlers:

  • No occurrences of eventData.data?.progress or object literals using "progress" in the desci-server service code.
  • The only progress reference found is in desci-server/src/services/DataMigration/DataMigrationWorker.ts, which is unrelated to WebSocket events.

Please manually verify the following before merging:

  • Frontend WebSocket consumers (in the client repository) are updated to read progress_percent instead of progress.
  • Any tests or documentation still mentioning progress have been updated to progress_percent.

@@ -0,0 +1,2 @@
-- DropIndex
DROP INDEX "idx_external_api_usage_querying_data_gin";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Drop the index CONCURRENTLY (and defensively).

Dropping a large GIN index with DROP INDEX can take an ACCESS EXCLUSIVE lock and block writes. Prefer DROP INDEX CONCURRENTLY IF EXISTS to avoid long blocking in production.

Apply this diff:

-DROP INDEX "idx_external_api_usage_querying_data_gin";
+DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin";

Note: CONCURRENTLY cannot run inside a transaction. Ensure this migration is executed without a surrounding transaction.

Verify other migrations with CONCURRENTLY (create/drop) and ensure none are wrapped in transactions:


🏁 Script executed:

#!/bin/bash
rg -nP 'CREATE\s+INDEX\s+CONCURRENTLY|DROP\s+INDEX\s+CONCURRENTLY' desci-server/prisma/migrations -C2

Length of output: 830


Drop the GIN index without blocking writes

To avoid an ACCESS EXCLUSIVE lock when removing a large GIN index, switch to DROP INDEX CONCURRENTLY IF EXISTS.

• File (desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql), line 2

-DROP INDEX "idx_external_api_usage_querying_data_gin";
+DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin";

• Note: CONCURRENTLY cannot run inside a transaction—ensure this migration is executed in QUERY_RUN_IN_TRANSACTION = false mode or otherwise unwrapped.

I’ve verified that the only other GIN-index migration uses CREATE INDEX CONCURRENTLY IF NOT EXISTS (20250818192759_add_gin_index_external_api_querying_data), and no other drops are currently concurrent.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DROP INDEX "idx_external_api_usage_querying_data_gin";
DROP INDEX CONCURRENTLY IF EXISTS "idx_external_api_usage_querying_data_gin";
🤖 Prompt for AI Agents
In
desci-server/prisma/migrations/20250818194118_gin_index_ext_api_querying_data/migration.sql
around line 2, the migration currently does a blocking DROP INDEX; replace it
with a non-blocking DROP INDEX CONCURRENTLY IF EXISTS
"idx_external_api_usage_querying_data_gin" and ensure the migration runs outside
a transaction (set QUERY_RUN_IN_TRANSACTION = false or otherwise unwrap the
migration) because CONCURRENTLY cannot run inside a transaction.

Comment on lines 13 to 18
const generatePresignedUrlSchema = z.object({
body: z.object({
fileHash: z.string().min(1, 'fileHash is required'),
fileName: z.string().min(1, 'fileName is required'),
}),
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid duplicating request schemas; centralize and add strict hash validation.

This file introduces its own generatePresignedUrlSchema, which diverges from src/schemas/externalApi.schema.ts. Duplication risks drift with docs and other controllers.

  • Reuse the central schema and update it to include fileHash with a strict hex pattern (64 chars).
  • Import the central schema here instead of redefining.
-import { z } from 'zod';
+import { z } from 'zod';
+import { generatePresignedUrlSchema } from '../../../schemas/externalApi.schema.js';
 
-const generatePresignedUrlSchema = z.object({
-  body: z.object({
-    fileHash: z.string().min(1, 'fileHash is required'),
-    fileName: z.string().min(1, 'fileName is required'),
-  }),
-});

Companion change in src/schemas/externalApi.schema.ts (see that file’s comment) to add fileHash and the regex.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const generatePresignedUrlSchema = z.object({
body: z.object({
fileHash: z.string().min(1, 'fileHash is required'),
fileName: z.string().min(1, 'fileName is required'),
}),
});
import { z } from 'zod';
import { generatePresignedUrlSchema } from '../../../schemas/externalApi.schema.js';
🤖 Prompt for AI Agents
In
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts
around lines 13-18, a local generatePresignedUrlSchema is being defined which
duplicates schema logic; remove this local definition and import and use the
centralized schema from src/schemas/externalApi.schema.ts instead. Update
src/schemas/externalApi.schema.ts to add the fileHash field under the
appropriate request body schema with a strict hex regex /^[a-fA-F0-9]{64}$/ and
retain the existing fileName validation, export the updated schema, then replace
the local schema usage in this controller with the imported centralized one and
adjust imports/types accordingly.

Comment on lines +66 to +71
return sendSuccess(res, {
cached: true,
message: 'Results for this file are already available',
resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash),
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return an UploadedFileName (or equivalent) alongside resultKey for GET compatibility.

/services/ai/referee-recommender/results expects UploadedFileName. Here you return only resultKey derived from fileHash. Unless resultKey === UploadedFileName, clients won’t be able to call GET without extra knowledge.

-      return sendSuccess(res, {
+      const resultKey = RefereeRecommenderService.prepareFormattedFileName(fileHash);
+      return sendSuccess(res, {
         cached: true,
         message: 'Results for this file are already available',
-        resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash),
+        resultKey,
+        // For compatibility with GET /results
+        UploadedFileName: resultKey,
       });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return sendSuccess(res, {
cached: true,
message: 'Results for this file are already available',
resultKey: RefereeRecommenderService.prepareFormattedFileName(fileHash),
});
}
const resultKey = RefereeRecommenderService.prepareFormattedFileName(fileHash);
return sendSuccess(res, {
cached: true,
message: 'Results for this file are already available',
resultKey,
// For compatibility with GET /results
UploadedFileName: resultKey,
});
🤖 Prompt for AI Agents
In
desci-server/src/controllers/externalApi/RefereeRecommender/issuePresignedUrl.ts
around lines 66 to 71, the response returns only resultKey (prepared from
fileHash) but the downstream endpoint /services/ai/referee-recommender/results
expects UploadedFileName; add an uploadedFileName property to the success
payload (either use the original uploaded filename from the request/body or
derive/format it the same way the results service expects) so clients can call
the GET endpoint without extra knowledge, and update the response type/interface
to include UploadedFileName.

Comment on lines +10 to +19
const PresignedUrlResponseSchema = z.object({
cached: z.boolean().describe('Whether results are cached'),
presignedUrl: z.string().optional().describe('S3 presigned upload URL'),
downloadUrl: z.string().optional().describe('S3 download URL'),
fileName: z.string().optional().describe('Generated S3 filename'),
fileHash: z.string().optional().describe('File hash'),
resultKey: z.string().optional().describe('Formatted cache key for results'),
message: z.string().optional().describe('Cache message'),
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

200 responses should include the { ok, data, message? } envelope.

Your controllers use sendSuccess, which wraps payloads. RefereeResultsSchema models this, but PresignedUrlResponseSchema (and TriggerRefereeResponseSchema) don’t. Docs will be wrong for clients.

Introduce a reusable success envelope:

 const PresignedUrlResponseSchema = z.object({
   cached: z.boolean().describe('Whether results are cached'),
   presignedUrl: z.string().optional().describe('S3 presigned upload URL'),
   downloadUrl: z.string().optional().describe('S3 download URL'),
   fileName: z.string().optional().describe('Generated S3 filename'),
   fileHash: z.string().optional().describe('File hash'),
   resultKey: z.string().optional().describe('Formatted cache key for results'),
   message: z.string().optional().describe('Cache message'),
 });
+
+const ApiSuccess = <T extends z.ZodTypeAny>(schema: T) =>
+  z.object({
+    ok: z.literal(true),
+    data: schema,
+    message: z.string().optional(),
+  });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const PresignedUrlResponseSchema = z.object({
cached: z.boolean().describe('Whether results are cached'),
presignedUrl: z.string().optional().describe('S3 presigned upload URL'),
downloadUrl: z.string().optional().describe('S3 download URL'),
fileName: z.string().optional().describe('Generated S3 filename'),
fileHash: z.string().optional().describe('File hash'),
resultKey: z.string().optional().describe('Formatted cache key for results'),
message: z.string().optional().describe('Cache message'),
});
const PresignedUrlResponseSchema = z.object({
cached: z.boolean().describe('Whether results are cached'),
presignedUrl: z.string().optional().describe('S3 presigned upload URL'),
downloadUrl: z.string().optional().describe('S3 download URL'),
fileName: z.string().optional().describe('Generated S3 filename'),
fileHash: z.string().optional().describe('File hash'),
resultKey: z.string().optional().describe('Formatted cache key for results'),
message: z.string().optional().describe('Cache message'),
});
const ApiSuccess = <T extends z.ZodTypeAny>(schema: T) =>
z.object({
ok: z.literal(true),
data: schema,
message: z.string().optional(),
});
🤖 Prompt for AI Agents
In desci-server/src/docs/aiServices.ts around lines 10 to 19, the
PresignedUrlResponseSchema models the raw payload but the API always returns the
sendSuccess envelope ({ ok, data, message? }), so update the file to define and
export a reusable success envelope (e.g. SuccessEnvelope = <generic zod wrapper
that enforces ok: true, data: T, message?: string>) and then wrap
PresignedUrlResponseSchema (and TriggerRefereeResponseSchema) with that envelope
(exporting the wrapped schemas for docs). Ensure types and exports are adjusted
so downstream docs/controllers use the enveloped schema.

Comment on lines +131 to +179
export const generatePresignedUrlOperation: ZodOpenApiOperationObject = {
operationId: 'generatePresignedUrl',
tags: ['AI Services'],
summary: 'Generate presigned URL for referee recommender file upload',
description:
'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.',
security: [{ bearerAuth: [] }],
requestBody: {
content: {
'application/json': {
schema: FileHashSchema,
},
},
},
responses: {
'200': {
description: 'Presigned URL generated successfully or cached results found',
content: {
'application/json': {
schema: PresignedUrlResponseSchema,
},
},
},
'400': {
description: 'Invalid request parameters',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'401': {
description: 'Authentication required',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'500': {
description: 'Internal server error',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
},
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Wrap generatePresignedUrl 200-response in the success envelope.

-        'application/json': {
-          schema: PresignedUrlResponseSchema,
-        },
+        'application/json': {
+          schema: ApiSuccess(PresignedUrlResponseSchema),
+        },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const generatePresignedUrlOperation: ZodOpenApiOperationObject = {
operationId: 'generatePresignedUrl',
tags: ['AI Services'],
summary: 'Generate presigned URL for referee recommender file upload',
description:
'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.',
security: [{ bearerAuth: [] }],
requestBody: {
content: {
'application/json': {
schema: FileHashSchema,
},
},
},
responses: {
'200': {
description: 'Presigned URL generated successfully or cached results found',
content: {
'application/json': {
schema: PresignedUrlResponseSchema,
},
},
},
'400': {
description: 'Invalid request parameters',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'401': {
description: 'Authentication required',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'500': {
description: 'Internal server error',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
},
};
export const generatePresignedUrlOperation: ZodOpenApiOperationObject = {
operationId: 'generatePresignedUrl',
tags: ['AI Services'],
summary: 'Generate presigned URL for referee recommender file upload',
description:
'Get a presigned S3 URL for uploading a PDF file for referee recommendation analysis. Checks for cached results first.',
security: [{ bearerAuth: [] }],
requestBody: {
content: {
'application/json': {
schema: FileHashSchema,
},
},
},
responses: {
'200': {
description: 'Presigned URL generated successfully or cached results found',
content: {
'application/json': {
schema: ApiSuccess(PresignedUrlResponseSchema),
},
},
},
'400': {
description: 'Invalid request parameters',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'401': {
description: 'Authentication required',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
'500': {
description: 'Internal server error',
content: {
'application/json': {
schema: ErrorResponseSchema,
},
},
},
},
};
🤖 Prompt for AI Agents
In desci-server/src/docs/aiServices.ts around lines 131 to 179, the 200 response
for generatePresignedUrl currently returns PresignedUrlResponseSchema directly;
wrap that response in the project's success envelope schema (e.g.,
SuccessResponseSchema or the equivalent wrapper that has a "data" field) so the
OpenAPI response shape matches other endpoints. Replace the 200 response content
schema with the success-envelope schema that contains PresignedUrlResponseSchema
as its data payload (or call the helper that composes a success schema around
PresignedUrlResponseSchema), and ensure the description/context remains
accurate.

Comment on lines +80 to +86
function prepareSessionCacheKey(fileUrl: string) {
// We use the file URLs here instead of the file hash, because it's the only
// deterministic link that we have available early on for every upload method (S3, CID, External pdf url)
// for us to be able to retrieve a result back.
// string -> SHA256 -> Base64
return createHash('sha256').update(fileUrl).digest('base64'); // ~44 chars with '=' padding
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding input validation for the fileUrl parameter

The function should validate that fileUrl is not empty or undefined before computing the hash.

 function prepareSessionCacheKey(fileUrl: string) {
+  if (!fileUrl || typeof fileUrl !== 'string') {
+    throw new Error('Invalid fileUrl provided for session cache key');
+  }
   // We use the file URLs here instead of the file hash, because it's the only
   // deterministic link that we have available early on for every upload method (S3, CID, External pdf url)
   // for us to be able to retrieve a result back.
   // string -> SHA256 -> Base64
   return createHash('sha256').update(fileUrl).digest('base64'); // ~44 chars with '=' padding
 }
🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderService.ts around
lines 80 to 86, the prepareSessionCacheKey function computes a SHA256 hash of
fileUrl without validating input; add an early validation that fileUrl is a
non-empty string (e.g. check typeof fileUrl === 'string' and
fileUrl.trim().length > 0), normalize or trim it if needed, and throw a clear
TypeError or return a rejected result when invalid so you never call createHash
with undefined/empty input; then proceed to compute and return the base64 hash
as before.

Comment on lines +92 to +98
function prepareFormattedFileName(fileHash: string) {
if (!ML_REFEREE_FILE_PREFIX) {
console.error('ML_REFEREE_FILE_PREFIX is not set');
throw new Error('ML_REFEREE_FILE_PREFIX env is not set');
}
return ML_REFEREE_FILE_PREFIX + fileHash;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use proper error handling instead of console.error

Using console.error directly is inconsistent with the logging strategy used throughout the rest of the file.

 function prepareFormattedFileName(fileHash: string) {
   if (!ML_REFEREE_FILE_PREFIX) {
-    console.error('ML_REFEREE_FILE_PREFIX is not set');
+    logger.error('ML_REFEREE_FILE_PREFIX is not set');
     throw new Error('ML_REFEREE_FILE_PREFIX env is not set');
   }
   return ML_REFEREE_FILE_PREFIX + fileHash;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function prepareFormattedFileName(fileHash: string) {
if (!ML_REFEREE_FILE_PREFIX) {
console.error('ML_REFEREE_FILE_PREFIX is not set');
throw new Error('ML_REFEREE_FILE_PREFIX env is not set');
}
return ML_REFEREE_FILE_PREFIX + fileHash;
}
function prepareFormattedFileName(fileHash: string) {
if (!ML_REFEREE_FILE_PREFIX) {
logger.error('ML_REFEREE_FILE_PREFIX is not set');
throw new Error('ML_REFEREE_FILE_PREFIX env is not set');
}
return ML_REFEREE_FILE_PREFIX + fileHash;
}
🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderService.ts around
lines 92 to 98, replace the direct console.error call with the service's
standard logger (e.g., logger.error or this.logger.error as used elsewhere in
this file) and log a clear message (optionally include fileHash) before throwing
the Error; do not change the thrown Error text, just use the consistent logger
API to emit the error instead of console.error.

Comment on lines 375 to 387
if (response.data.uploaded_file_name) {
// Use prepareCacheKey on the fileUrl for consistent hashing
const hashedFileUrl = prepareSessionCacheKey(request.file_url);

const session: RefereeRecommenderSession = {
userId,
originalFileName: `${request.cid}`,
hashedFileUrl,
createdAt: Date.now(),
};

await storeSession(response.data.uploaded_file_name, session);
await markSessionAsActive(userId, response.data.uploaded_file_name);
await storeSession(request.file_url, session);
await markSessionAsActive(userId, request.file_url);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential race condition when storing sessions

There's a potential race condition between storing the session and marking it as active. If the second operation fails, the session will be stored but not marked as active, leading to an inconsistent state.

Consider using a transaction or combining these operations:

     if (response.data.uploaded_file_name) {
       // Use prepareCacheKey on the fileUrl for consistent hashing
       const hashedFileUrl = prepareSessionCacheKey(request.file_url);

       const session: RefereeRecommenderSession = {
         userId,
         hashedFileUrl,
         createdAt: Date.now(),
       };

-      await storeSession(request.file_url, session);
-      await markSessionAsActive(userId, request.file_url);
+      const storeResult = await storeSession(request.file_url, session);
+      if (storeResult.isOk()) {
+        await markSessionAsActive(userId, request.file_url);
+      } else {
+        logger.error({ error: storeResult.error }, 'Failed to store session');
+        // Consider whether to continue or fail the request
+      }

Committable suggestion skipped: line range outside the PR's diff.

'[SQS] Processing referee recommender event',
);

// debugger;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove debugger statement

The debugger statement should be removed before merging to production.

-      // debugger;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// debugger;
🤖 Prompt for AI Agents
In desci-server/src/services/externalApi/RefereeRecommenderSqsHandler.ts around
line 83, there is a leftover debugger; statement that should be removed before
merging to production; delete the debugger; line and ensure no other debugger
calls remain in the file, then run the linter/tests to confirm no unused or
stray debugging statements remain.

Comment on lines +31 to +34
file_hash?: string; // SHA-256 hash of the file content
file_url: string; // Source URL
data?: ExternalApiSqsMessageData | any;
fileUrlHashed: string;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make fileUrlHashed optional and avoid any in data (align naming).

  • Requiring fileUrlHashed will break existing producers until all are updated. Make it optional.
  • Avoid any to preserve type-safety; prefer Record<string, unknown> or keep it strictly ExternalApiSqsMessageData.
  • Mixed casing across fields (file_url, file_hash, fileUrlHashed) is brittle. Either stick to snake_case for wire payloads or keep both keys during transition.

Apply:

   file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security.
-  file_hash?: string; // SHA-256 hash of the file content
-  file_url: string; // Source URL
-  data?: ExternalApiSqsMessageData | any;
-  fileUrlHashed: string;
+  /** SHA-256 hash of the file content */
+  file_hash?: string;
+  /** Source URL */
+  file_url: string;
+  /** Structured metadata payload */
+  data?: ExternalApiSqsMessageData | Record<string, unknown>;
+  /** @deprecated Prefer snake_case wire key `file_url_hashed` */
+  fileUrlHashed?: string;
+  /** SHA-256 hash of the normalized file URL */
+  file_url_hashed?: string;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
file_hash?: string; // SHA-256 hash of the file content
file_url: string; // Source URL
data?: ExternalApiSqsMessageData | any;
fileUrlHashed: string;
file_name: string; // e.g., "referee_rec_v0.1.3_{original_file.pdf}" // Move to hashes later for security.
/** SHA-256 hash of the file content */
file_hash?: string;
/** Source URL */
file_url: string;
/** Structured metadata payload */
data?: ExternalApiSqsMessageData | Record<string, unknown>;
/** @deprecated Prefer snake_case wire key `file_url_hashed` */
fileUrlHashed?: string;
/** SHA-256 hash of the normalized file URL */
file_url_hashed?: string;
🤖 Prompt for AI Agents
In desci-server/src/services/sqs/SqsMessageTypes.ts around lines 31–34, make
fileUrlHashed optional (fileUrlHashed?: string) so existing producers don't
break, replace the loose `any` in `data` with a safer union type such as
`ExternalApiSqsMessageData | Record<string, unknown>` to preserve type-safety,
and normalize field naming for the wire format by either using snake_case (e.g.,
file_url, file_hash, file_url_hashed) or explicitly keeping both variants during
a transition (document and populate both keys) so consumers/producers remain
compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant