Name	Name	Last commit message	Last commit date
parent directory ..
src	src
.gitignore	.gitignore
Dockerfile	Dockerfile
README.md	README.md
docker-compose.yml	docker-compose.yml
pom.xml	pom.xml
run.sh	run.sh

Name

Last commit message

Last commit date

Answering Questions About a Q4 Earnings Report

A user uploads a 24-page Q4 Earnings Report PDF and asks "What was the total revenue?" Instead of reading all 24 pages, the system ingests the document, splits it into 42 semantic chunks, builds a vector index, retrieves the top relevant chunks, and synthesizes a natural-language answer with a confidence score.

Workflow

documentUrl, question
       │
       ▼
┌──────────────┐
│ dqa_ingest   │  Fetch and parse document
└──────┬───────┘
       │  document {title, pages: 24, wordCount: 8500, format: "pdf"}
       ▼
┌──────────────┐
│ dqa_chunk    │  Split into 42 semantic chunks
└──────┬───────┘
       │  chunks[], chunkCount: 42
       ▼
┌──────────────┐
│ dqa_index    │  Create vector index with embeddings
└──────┬───────┘
       │  indexId: "IDX-<base36-timestamp>", dimensions: 1536
       ▼
┌──────────────┐
│ dqa_query    │  Retrieve top relevant chunks
└──────┬───────┘
       │  relevantChunks [{id: 5, text: "Total revenue Q4: $2.4B", score: 0.96}]
       ▼
┌──────────────┐
│ dqa_answer   │  Synthesize answer from chunks
└──────────────┘
       │
       ▼
  chunkCount: 42, answer, confidence: 0.95

Workers

IngestWorker (dqa_ingest) -- Logs the documentUrl from input and returns a document metadata map: title: "Q4 Earnings Report", pages: 24, wordCount: 8500, format: "pdf".

ChunkWorker (dqa_chunk) -- Returns a fixed chunk list with one sample entry {id: 0, text: "Revenue grew 15%"}, reports chunkCount: 42 and strategy: "semantic".

IndexWorker (dqa_index) -- Generates a unique index ID using "IDX-" + Long.toString(System.currentTimeMillis(), 36).toUpperCase(). Reports embeddingModel: "text-embedding-3-small" and dimensions: 1536.

QueryWorker (dqa_query) -- Returns a list of relevant chunks, with the top result being {id: 5, text: "Total revenue Q4: $2.4B", score: 0.96}.

AnswerWorker (dqa_answer) -- Produces the answer "Q4 total revenue was $2.4 billion, a 15% YoY increase." with confidence: 0.95 and sourceChunks: [5, 8, 12] indicating which chunk IDs contributed.

Tests

10 tests across 5 test files cover each stage of the document QA pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Answering Questions About a Q4 Earnings Report

Workflow

Workers

Tests

Further Reading

FilesExpand file tree

document-qa

Directory actions

More options

Directory actions

More options

Latest commit

History

document-qa

Folders and files

parent directory

README.md

Answering Questions About a Q4 Earnings Report

Workflow

Workers

Tests

Further Reading