feat: add multi-document support to retrieval and client API#216
Open
Shreyansh1729 wants to merge 4 commits intoVectifyAI:mainfrom
Open
feat: add multi-document support to retrieval and client API#216Shreyansh1729 wants to merge 4 commits intoVectifyAI:mainfrom
Shreyansh1729 wants to merge 4 commits intoVectifyAI:mainfrom
Conversation
added 3 commits
March 28, 2026 00:16
- Use .get() with safe defaults for all LLM response dict accesses - Optimize extract_toc_content retry loop to grow chat_history incrementally instead of rebuilding with full accumulated response - Optimize toc_transformer retry loop to use chat_history instead of re-embedding the entire raw TOC and incomplete JSON in each prompt - Return best-effort results on max retries instead of raising - Add 14 mock-based tests covering all fix scenarios Closes VectifyAI#163
- Restore explicit Exception on max retries instead of silent warning - Move truncation logic before the retry loop so it only runs once on the initial incomplete response, not on every iteration - Add explicit None guard for physical_index before passing to convert_physical_index_to_int to prevent potential TypeError - Update test to expect Exception on max retries
- Update retrieve.py functions to support Union[str, List[str]] for doc_id - If a list of IDs is provided, return a JSON object mapping IDs to results - Update PageIndexClient methods to support batch querying - Add 5 comprehensive unit tests for multi-doc support - Maintain 100% backward compatibility for single-doc requests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Added the ability to query multiple documents simultaneously, addressing Issue #187. This allows for cross-document data retrieval and combined reasoning in RAG applications.
Changes Made
get_document,get_document_structure, andget_page_contentto accept a single string or a list of strings fordoc_id.PageIndexClientmethods to supportUnion[str, List[str]]for batch querying.tests/test_multi_doc.pywith 5 tests verifying batch metadata, structure, and content retrieval, along with error handling and backward compatibility.Verification
Run tests:
export PYTHONPATH=. && pytest tests/test_multi_doc.pyResult: 5 passed.
Closes #187