Skip to content

feat: add multi-document support to retrieval and client API#216

Open
Shreyansh1729 wants to merge 4 commits intoVectifyAI:mainfrom
Shreyansh1729:feat/multi-doc-support
Open

feat: add multi-document support to retrieval and client API#216
Shreyansh1729 wants to merge 4 commits intoVectifyAI:mainfrom
Shreyansh1729:feat/multi-doc-support

Conversation

@Shreyansh1729
Copy link
Copy Markdown

Summary

Added the ability to query multiple documents simultaneously, addressing Issue #187. This allows for cross-document data retrieval and combined reasoning in RAG applications.

Changes Made

  • retrieve.py: Refactored get_document, get_document_structure, and get_page_content to accept a single string or a list of strings for doc_id.
  • client.py: Updated PageIndexClient methods to support Union[str, List[str]] for batch querying.
  • tests: Added tests/test_multi_doc.py with 5 tests verifying batch metadata, structure, and content retrieval, along with error handling and backward compatibility.

Verification

Run tests: export PYTHONPATH=. && pytest tests/test_multi_doc.py
Result: 5 passed.

Closes #187

Your Name added 3 commits March 28, 2026 00:16
- Use .get() with safe defaults for all LLM response dict accesses
- Optimize extract_toc_content retry loop to grow chat_history
  incrementally instead of rebuilding with full accumulated response
- Optimize toc_transformer retry loop to use chat_history instead of
  re-embedding the entire raw TOC and incomplete JSON in each prompt
- Return best-effort results on max retries instead of raising
- Add 14 mock-based tests covering all fix scenarios

Closes VectifyAI#163
- Restore explicit Exception on max retries instead of silent warning
- Move truncation logic before the retry loop so it only runs once
  on the initial incomplete response, not on every iteration
- Add explicit None guard for physical_index before passing to
  convert_physical_index_to_int to prevent potential TypeError
- Update test to expect Exception on max retries
- Update retrieve.py functions to support Union[str, List[str]] for doc_id
- If a list of IDs is provided, return a JSON object mapping IDs to results
- Update PageIndexClient methods to support batch querying
- Add 5 comprehensive unit tests for multi-doc support
- Maintain 100% backward compatibility for single-doc requests
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple Document Chat

1 participant