Name	Name	Last commit message	Last commit date
parent directory ..
src	src
.gitignore	.gitignore
Dockerfile	Dockerfile
README.md	README.md
docker-compose.yml	docker-compose.yml
pom.xml	pom.xml
run.sh	run.sh

Name

Last commit message

Last commit date

Ocr Pipeline

An invoice processing system receives scanned documents as image files. Each image needs preprocessing (deskew, contrast enhancement), OCR text extraction, field parsing (invoice number, date, line items, total), and validation that the extracted total matches the sum of line items.

Pipeline

[oc_preprocess_image]
     |
     v
[oc_extract_text]
     |
     v
[oc_validate_text]
     |
     v
[oc_structure_output]

Workflow inputs: imageUrl, documentType, language

Workers

ExtractTextWorker (task: oc_extract_text)

Extracts text from a preprocessed image using OCR.

Writes rawText, confidence, characterCount

PreprocessImageWorker (task: oc_preprocess_image)

Preprocesses an image for OCR: deskew, binarize, contrast-enhance.

Writes processedImage, deskewAngle, enhancementsApplied

StructureOutputWorker (task: oc_structure_output)

Organizes validated fields into a structured document.

Writes structured, fieldCount

ValidateTextWorker (task: oc_validate_text)

Validates extracted OCR text and identifies structured fields.

Trims whitespace
Writes cleanText, fields, validationScore

17 tests | Workflow: ocr_pipeline | Timeout: 60s

See RUNNING.md for setup and usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Ocr Pipeline

Pipeline

Workers

FilesExpand file tree

ocr-pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

ocr-pipeline

Folders and files

parent directory

README.md

Ocr Pipeline

Pipeline

Workers