Skip to content

Document processing#53

Open
DanTen317 wants to merge 24 commits intoostis-apps:mainfrom
DanTen317:document-processing
Open

Document processing#53
DanTen317 wants to merge 24 commits intoostis-apps:mainfrom
DanTen317:document-processing

Conversation

@DanTen317
Copy link
Copy Markdown
Contributor

Text Processing Module

A module for processing text documents with the following key features:

  • Document Loading: Load and parse PDF documents
  • Text Cleaning: Remove artifacts, normalize spacing and formatting
  • Structure Analysis: Automatically detect chapters, sections and headers
  • Content Processing: Split documents into structured chunks while preserving hierarchy and context

The module is designed for processing academic and technical documentation with hierarchical content organization.

The module is designed to be modular, allowing for easy integration of new document formats and processing strategies.


Also some fixes and features such as CallbackHandler for LLM.

@DanTen317 DanTen317 force-pushed the document-processing branch from b5ef9cb to 7554b72 Compare May 11, 2025 23:07
@DanTen317 DanTen317 force-pushed the document-processing branch from 61f5683 to cf211cf Compare May 11, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant