Skip to content

Feature#44

Merged
ssrajadh merged 5 commits intomasterfrom
feature
Apr 18, 2026
Merged

Feature#44
ssrajadh merged 5 commits intomasterfrom
feature

Conversation

@ssrajadh
Copy link
Copy Markdown
Owner

retry failed embeddings automatically and add DLQ

ssrajadh and others added 5 commits April 18, 2026 16:02
Replaces the batched per-file write with a per-chunk upsert, so a crash
mid-file no longer discards already-embedded chunks. On resume, each
chunk's deterministic ID is checked via the new has_chunk() and skipped
if present.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Persistent JSON-backed record at ~/.sentrysearch/dlq.json. Each entry
stores the chunk ID, source file, start/end time, truncated error
message, attempt count, and timestamp. Atomic writes via a .tmp rename.

Not yet wired into the indexing pipeline — follow-up commit adds the
retry wrapper that feeds this queue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wraps embed_video_chunk with exponential-backoff retry (default 3
attempts). Failures are classified:
  - Permanent (OOM, missing file, decode error): routed to DLQ
    immediately — retrying would just fail the same way.
  - Transient: retried with 2/4/8s backoff before DLQ.
  - Gemini quota/auth errors: bubble up so the user stops and fixes.

DLQ'd chunks are skipped on subsequent runs so one poisoned chunk
doesn't block the rest of the library. Summary line now reports DLQ
counts and points users at `--retry-failed` for re-attempts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
  sentrysearch index <dir> --retry-failed
      Pulls entries back out of the DLQ and re-attempts them. Successful
      retries remove the entry; failures get recorded again.

  sentrysearch dlq list
      Prints each failed chunk with source file, timestamp, attempt
      count, and the recorded error.

  sentrysearch dlq clear
      Empties the DLQ (with confirmation).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extracts the start/end-time math from chunk_video into a pure
expected_chunk_spans() helper. The index command now probes the
video's duration, computes the chunk IDs it would produce, and skips
the file entirely if all of them are present in the store.

Restores the fast path that was lost when is_indexed() was removed
in favor of per-chunk resume. Files with skipped still-frame chunks
won't hit the fast path (stored count < expected) and will fall
through to the normal per-chunk loop, same as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ssrajadh ssrajadh merged commit 7a0cda7 into master Apr 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant