workglow-dev
diff --git a/‎.claude/CLAUDE.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/CLAUDE.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/developers/03_extending.md‎
Lines changed: 21 additions & 26 deletions b/‎docs/developers/03_extending.md‎
Lines changed: 21 additions & 26 deletions
diff --git a/‎docs/technical/08-knowledge-base-and-rag.md‎
Lines changed: 26 additions & 29 deletions b/‎docs/technical/08-knowledge-base-and-rag.md‎
Lines changed: 26 additions & 29 deletions
diff --git a/‎package.json‎
Lines changed: 2 additions & 2 deletions b/‎package.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎packages/ai/README.md‎
Lines changed: 11 additions & 15 deletions b/‎packages/ai/README.md‎
Lines changed: 11 additions & 15 deletions
@@ -162,7 +162,7 @@ Model system: `ModelRepository`, `ModelRegistry`, `AiProviderRegistry`.
 
 Task categories: text generation/embedding/summary/translation/rewriting/classification, image classification/embedding/segmentation, RAG (chunking, vector search, retrieval, reranking), vision/pose detection.
 
-RAG tasks: `ChunkToVectorTask` (input: `vector` + `chunks` → output: `vectors`), `ChunkVectorUpsertTask` (input: `knowledgeBase` + `vectors`), `ChunkRetrievalTask` (input: `knowledgeBase` + `query` + `model`), `ChunkVectorSearchTask`, `ChunkVectorHybridSearchTask`, `HierarchyJoinTask`.
+RAG tasks: `ChunkVectorUpsertTask` (input: `knowledgeBase` + `chunks` + `vector`, optional `doc_title`), `ChunkRetrievalTask` (input: `knowledgeBase` + `query` + `model`, with `method: "similarity" | "hybrid"`), `HierarchyJoinTask`, `RerankerTask`, `QueryExpanderTask`, `TextChunkerTask`, `HierarchicalChunkerTask`.
 
 ### `@workglow/ai-provider` — provider implementations
 
 
@@ -278,26 +278,23 @@ The `workglow` package provides a comprehensive set of tasks for building RAG (R
 
 ### Vector and Embedding Tasks
 
-| Task                    | Description                                    |
-| ----------------------- | ---------------------------------------------- |
-| `TextEmbeddingTask`     | Generates embeddings using configurable models |
-| `ChunkToVectorTask`     | Transforms chunks to vector store format       |
-| `ChunkVectorUpsertTask` | Stores vectors in a repository                 |
-| `ChunkVectorSearchTask` | Searches vectors by similarity                 |
-| `VectorQuantizeTask`    | Quantizes vectors for storage efficiency       |
+| Task                    | Description                                                                 |
+| ----------------------- | --------------------------------------------------------------------------- |
+| `TextEmbeddingTask`     | Generates embeddings using configurable models                              |
+| `ChunkVectorUpsertTask` | Stores chunks + embeddings in a KnowledgeBase (1:1 aligned)                 |
+| `VectorQuantizeTask`    | Quantizes vectors for storage efficiency                                    |
 
 ### Retrieval and Generation Tasks
 
-| Task                          | Description                                   |
-| ----------------------------- | --------------------------------------------- |
-| `QueryExpanderTask`           | Expands queries for better retrieval coverage |
-| `ChunkVectorHybridSearchTask` | Combines vector and full-text search          |
-| `RerankerTask`                | Reranks search results for relevance          |
-| `HierarchyJoinTask`           | Enriches results with parent context          |
-| `ContextBuilderTask`          | Builds context for LLM prompts                |
-| `ChunkRetrievalTask`          | Orchestrates end-to-end retrieval             |
-| `TextQuestionAnswerTask`      | Generates answers from context                |
-| `TextGenerationTask`          | General text generation                       |
+| Task                     | Description                                                              |
+| ------------------------ | ------------------------------------------------------------------------ |
+| `ChunkRetrievalTask`     | End-to-end retrieval: embeds query, runs similarity or hybrid search     |
+| `QueryExpanderTask`      | Rule-based query expansion (multi-query / synonyms)                      |
+| `RerankerTask`           | Reranks results (simple heuristic or reciprocal-rank-fusion)             |
+| `HierarchyJoinTask`      | Enriches retrieval metadata with parent context                          |
+| `ContextBuilderTask`     | Builds formatted context for LLM prompts                                 |
+| `TextQuestionAnswerTask` | Generates answers from context                                           |
+| `TextGenerationTask`     | General text generation                                                  |
 
 ### Chainable RAG Pipeline Example
 
@@ -332,7 +329,6 @@ await new Workflow()
   .textEmbedding({
     model: "Xenova/all-MiniLM-L6-v2",
   })
-  .chunkToVector()
   .chunkVectorUpsert({
     knowledgeBase: "my-kb",
   })
@@ -393,13 +389,12 @@ interface BaseNode {
 
 Each task passes through what the next task needs:
 
-| Task                  | Passes Through           | Adds                                  |
-| --------------------- | ------------------------ | ------------------------------------- |
-| `structuralParser`    | -                        | `doc_id`, `documentTree`, `nodeCount` |
-| `documentEnricher`    | `doc_id`, `documentTree` | `summaryCount`, `entityCount`         |
-| `hierarchicalChunker` | `doc_id`                 | `chunks`, `text[]`, `count`           |
-| `textEmbedding`       | (implicit)               | `vector[]`                            |
-| `chunkToVector`       | -                        | `ids[]`, `vectors[]`, `metadata[]`    |
-| `chunkVectorUpsert`   | -                        | `count`, `ids`                        |
+| Task                  | Passes Through           | Adds                                      |
+| --------------------- | ------------------------ | ----------------------------------------- |
+| `structuralParser`    | -                        | `doc_id`, `documentTree`, `nodeCount`     |
+| `documentEnricher`    | `doc_id`, `documentTree` | `summaryCount`, `entityCount`             |
+| `hierarchicalChunker` | `doc_id`                 | `chunks: ChunkRecord[]`, `text[]`, `count` |
+| `textEmbedding`       | (implicit)               | `vector[]`                                |
+| `chunkVectorUpsert`   | -                        | `count`, `doc_id`, `chunk_ids`            |
 
 This design eliminates the need for external loops - the entire pipeline chains together naturally.
@@ -429,7 +429,7 @@ StructuralParser -----> DocumentRootNode tree
 Chunking Task --------> ChunkRecord[]
     |
     v
-ChunkToVectorTask ----> Float32Array[] (embeddings)
+TextEmbeddingTask ----> Float32Array[] (embeddings)
     |
     v
 ChunkVectorUpsertTask -> Stored in KnowledgeBase vector storage
@@ -443,31 +443,36 @@ AI Generation Task ---> Answer with context
 
 ### Key RAG Tasks
 
-**ChunkToVectorTask** -- Generates embeddings for an array of chunks using a specified model:
+**TextEmbeddingTask** -- Generates embeddings for text or an array of chunks using a specified model:
 
 ```typescript
-// Input: { vector: TypedArray, chunks: ChunkRecord[] }
-// Output: { vectors: Float32Array[] }
+// Input: { text: string | string[], model: "..." }
+// Output: { vector: TypedArray | TypedArray[] }
 ```
 
-**ChunkVectorUpsertTask** -- Stores chunk vectors in a knowledge base:
+**ChunkVectorUpsertTask** -- Stores chunks + their embeddings in a knowledge base (1:1 aligned):
 
 ```typescript
-// Input: { knowledgeBase: "my-kb", vectors: ChunkVectorEntity[] }
+// Input: { knowledgeBase: "my-kb", chunks: ChunkRecord[], vector: TypedArray[] }
+// Output: { count: number, doc_id: string, chunk_ids: string[] }
 ```
 
-**ChunkRetrievalTask** -- Retrieves relevant chunks from a knowledge base given a query:
+**ChunkRetrievalTask** -- End-to-end retrieval. Embeds the query (if a string),
+then runs similarity or hybrid search:
 
 ```typescript
-// Input: { knowledgeBase: "my-kb", query: "What is...", model: "..." }
-// Output: { chunks: ChunkSearchResult[] }
+// Input: { knowledgeBase, query, model?, method?: "similarity" | "hybrid",
+//          topK?, filter?, scoreThreshold?, vectorWeight?, returnVectors? }
+// Output: { chunks, chunk_ids, metadata, scores, count, query, vectors? }
 ```
 
-**ChunkVectorSearchTask** -- Direct vector similarity search against a knowledge base.
+**HierarchyJoinTask** -- Given retrieved metadata, walks the document tree to
+reconstruct section context and enrich the metadata with ancestor information:
 
-**ChunkVectorHybridSearchTask** -- Combined vector + full-text search.
-
-**HierarchyJoinTask** -- Given chunk search results, walks the document tree to reconstruct section context and enrich the results with ancestor information.
+```typescript
+// Input: { knowledgeBase, metadata, includeParentSummaries?, includeEntities? }
+// Output: { metadata: ChunkRecord[], count }
+```
 
 ### Example Workflow
 
@@ -481,22 +486,14 @@ const kb = await createKnowledgeBase({
   vectorDimensions: 1024,
 });
 
-// Build an ingestion pipeline
-const workflow = new Workflow("ingest");
-const parseTask = workflow.addTask("StructuralParseTask", {
-  text: documentText,
-  title: "My Paper",
-});
-const chunkTask = workflow.addTask("ChunkingTask", {});
-const embedTask = workflow.addTask("ChunkToVectorTask", {
-  model: "text-embedding-3-small",
-});
-const upsertTask = workflow.addTask("ChunkVectorUpsertTask", {
-  knowledgeBase: "research-papers",
-});
-
-workflow.pipe(parseTask, chunkTask, embedTask, upsertTask);
-await workflow.run();
+// Build an ingestion pipeline — five chained tasks, no transform step needed
+await new Workflow()
+  .structuralParser({ text: documentText, title: "My Paper" })
+  .documentEnricher({ generateSummaries: true, extractEntities: true })
+  .hierarchicalChunker({ maxTokens: 512 })
+  .textEmbedding({ model: "text-embedding-3-small" })
+  .chunkVectorUpsert({ knowledgeBase: "research-papers" })
+  .run();
 ```
 
 ## Global Registry
 
@@ -1,7 +1,7 @@
 {
-  "name": "workglow",
+  "name": "@workglow-dev/libs",
   "type": "module",
-  "version": "0.0.10",
+  "version": "0.2.15",
   "repository": {
     "type": "git",
     "url": "https://github.com/workglow-dev/workglow.git"
 
@@ -516,23 +516,20 @@ The AI package provides a comprehensive set of tasks for building RAG pipelines.
 
 ### Vector and Storage Tasks
 
-| Task                    | Description                                                                              |
-| ----------------------- | ---------------------------------------------------------------------------------------- |
-| `ChunkToVectorTask`     | Transforms chunks to vector store format (input: `vector` + `chunks`, output: `vectors`) |
-| `ChunkVectorUpsertTask` | Stores vectors in a KnowledgeBase (input: `knowledgeBase` + `vectors`)                   |
-| `ChunkVectorSearchTask` | Searches vectors by similarity                                                           |
-| `VectorQuantizeTask`    | Quantizes vectors for storage efficiency                                                 |
+| Task                    | Description                                                                       |
+| ----------------------- | --------------------------------------------------------------------------------- |
+| `ChunkVectorUpsertTask` | Stores chunks + their embeddings in a KnowledgeBase (input: `chunks` + `vector`)  |
+| `VectorQuantizeTask`    | Quantizes vectors for storage efficiency                                          |
 
 ### Retrieval and Generation Tasks
 
-| Task                          | Description                                   |
-| ----------------------------- | --------------------------------------------- |
-| `QueryExpanderTask`           | Expands queries for better retrieval coverage |
-| `ChunkVectorHybridSearchTask` | Combines vector and full-text search          |
-| `RerankerTask`                | Reranks search results for relevance          |
-| `HierarchyJoinTask`           | Enriches results with parent context          |
-| `ContextBuilderTask`          | Builds context for LLM prompts                |
-| `ChunkRetrievalTask`          | Orchestrates end-to-end retrieval             |
+| Task                 | Description                                                              |
+| -------------------- | ------------------------------------------------------------------------ |
+| `ChunkRetrievalTask` | End-to-end retrieval: embeds the query, runs similarity or hybrid search |
+| `QueryExpanderTask`  | Expands queries (multi-query / synonyms) for better retrieval coverage   |
+| `RerankerTask`       | Reranks search results (simple heuristic or reciprocal-rank-fusion)      |
+| `HierarchyJoinTask`  | Enriches retrieved metadata with parent summaries, section titles, entities |
+| `ContextBuilderTask` | Builds formatted context for LLM prompts                                 |
 
 ### Complete RAG Workflow Example
 
@@ -600,7 +597,6 @@ await new Workflow()
   .textEmbedding({
     model: "onnx:Xenova/all-MiniLM-L6-v2:q8",
   })
-  .chunkToVector()
   .chunkVectorUpsert({
     knowledgeBase: "my-kb",
   })