diff --git a/.rubocop.yml b/.rubocop.yml
index dbcc4fae..dfadfcd4 100644
--- a/.rubocop.yml
+++ b/.rubocop.yml
@@ -5,6 +5,10 @@ plugins:
 Layout/LeadingCommentSpace:
   AllowRBSInlineAnnotation: true
 
+# @rbs type annotations can be long
+Layout/LineLength:
+  Max: 140
+
 AllCops:
   TargetRubyVersion: 3.1
   NewCops: enable
diff --git a/README.md b/README.md
index 19fc6d41..bbca62ab 100644
--- a/README.md
+++ b/README.md
@@ -4,531 +4,138 @@
 [![CI](https://github.com/cardmagic/classifier/actions/workflows/ruby.yml/badge.svg)](https://github.com/cardmagic/classifier/actions/workflows/ruby.yml)
 [![License: LGPL](https://img.shields.io/badge/License-LGPL_2.1-blue.svg)](https://opensource.org/licenses/LGPL-2.1)
 
-A Ruby library for text classification using Bayesian, Logistic Regression, LSI (Latent Semantic Indexing), k-Nearest Neighbors (kNN), and TF-IDF algorithms.
+Text classification in Ruby. Five algorithms, native performance, streaming support.
 
-**[Documentation](https://rubyclassifier.com/docs)** · **[Tutorials](https://rubyclassifier.com/docs/tutorials)** · **[Guides](https://rubyclassifier.com/docs/guides)**
-
-> **Note:** This is the original `classifier` gem, maintained for 20 years since 2005. After a quieter period, active development resumed in 2025 with major new features. If you're choosing between this and a fork, this is the canonical, most actively-developed version.
+**[Documentation](https://rubyclassifier.com/docs)** · **[Tutorials](https://rubyclassifier.com/docs/tutorials)** · **[API Reference](https://rubyclassifier.com/docs/api)**
 
 ## Why This Library?
 
-This gem has features no fork provides:
-
-| | This Gem | Forks |
+| | This Gem | Other Forks |
 |:--|:--|:--|
 | **Algorithms** | 5 classifiers | 2 only |
-| **LSI Performance** | Native C (5-50x faster) | Pure Ruby, or requires GSL/Numo + system libs |
-| **Persistence** | Pluggable (file, Redis, S3, SQL) | Marshal only |
-| **Thread Safety** | ✅ | ❌ |
-| **Type Annotations** | RBS throughout | ❌ |
-| **Laplace Smoothing** | Numerically stable | ❌ Unstable |
-| **Probability Calibration** | ✅ | ❌ |
-| **Feature Weights** | ✅ Interpretable | ❌ |
-
-### Recent Developments (Late 2025)
-
-- Added Logistic Regression classifier with SGD and L2 regularization
-- Added k-Nearest Neighbors classifier with distance-weighted voting
-- Added TF-IDF vectorizer with n-gram support
-- Built zero-dependency native C extension (replaces GSL or Numo requirement)
-- Added pluggable storage backends for persistence
-- Made all classifiers thread-safe
-- Fixed Laplace smoothing for numerical stability
-- Added RBS type signatures throughout
-- Modernized for new Ruby coding standards
-
-## Table of Contents
-
-- [Installation](#installation)
-- [Algorithms](#bayesian-classifier)
-  - [Bayesian Classifier](#bayesian-classifier)
-  - [Logistic Regression](#logistic-regression)
-  - [LSI (Latent Semantic Indexing)](#lsi-latent-semantic-indexing)
-  - [k-Nearest Neighbors (kNN)](#k-nearest-neighbors-knn)
-  - [TF-IDF Vectorizer](#tf-idf-vectorizer)
-- [Persistence](#persistence)
-- [Performance](#performance)
-- [Development](#development)
-- [Contributing](#contributing)
-- [License](#license)
+| **Incremental LSI** | Brand's algorithm (no rebuild) | Full SVD rebuild on every add |
+| **LSI Performance** | Native C extension (5-50x faster) | Pure Ruby or requires GSL |
+| **Streaming** | Train on multi-GB datasets | Must load all data in memory |
+| **Persistence** | Pluggable (file, Redis, S3) | Marshal only |
 
 ## Installation
 
-Add to your Gemfile:
-
 ```ruby
 gem 'classifier'
 ```
 
-Then run:
-
-```bash
-bundle install
-```
-
-Or install directly:
-
-```bash
-gem install classifier
-```
-
-### Native C Extension
+## Quick Start
 
-The gem includes a zero-dependency native C extension for fast LSI operations (5-50x faster than pure Ruby). It compiles automatically during installation.
-
-To verify the native extension is active:
+### Bayesian
 
 ```ruby
-require 'classifier'
-puts Classifier::LSI.backend  # => :native
-```
-
-To force pure Ruby mode (for debugging):
-
-```bash
-NATIVE_VECTOR=true ruby your_script.rb
-```
-
-## Bayesian Classifier
-
-Fast, accurate classification with modest memory requirements. Ideal for spam filtering, sentiment analysis, and content categorization.
-
-### Quick Start
-
-```ruby
-require 'classifier'
-
 classifier = Classifier::Bayes.new(:spam, :ham)
-
-# Train with keyword arguments
-classifier.train(spam: "Buy cheap v1agra now! Limited offer!")
-classifier.train(ham: "Meeting scheduled for tomorrow at 10am")
-
-# Train multiple items at once
-classifier.train(
-  spam: ["You've won a million dollars!", "Free money!!!"],
-  ham: ["Please review the document", "Lunch tomorrow?"]
-)
-
-# Classify new text
-classifier.classify "Congratulations! You've won a prize!"
-# => "Spam"
-```
-
-### Learn More
-
-- [Bayes Basics Guide](https://rubyclassifier.com/docs/guides/bayes/basics) - In-depth documentation
-- [Build a Spam Filter Tutorial](https://rubyclassifier.com/docs/tutorials/spam-filter) - Step-by-step guide
-- [Paul Graham: A Plan for Spam](http://www.paulgraham.com/spam.html)
-
-## Logistic Regression
-
-Linear classifier using Stochastic Gradient Descent (SGD). Often more accurate than Naive Bayes while remaining fast and interpretable. Provides well-calibrated probability estimates and feature weights for understanding which words drive predictions.
-
-### Key Features
-
-- **More Accurate**: No independence assumption means better accuracy on many text problems
-- **Calibrated Probabilities**: Unlike Bayes, probabilities reflect true confidence
-- **Interpretable**: Feature weights show which words matter for each class
-- **Fast**: Linear time prediction, efficient SGD training
-- **L2 Regularization**: Prevents overfitting on small datasets
-
-### Quick Start
-
-```ruby
-require 'classifier'
-
-classifier = Classifier::LogisticRegression.new(:spam, :ham)
-
-# Train with keyword arguments (same API as Bayes)
-classifier.train(spam: ["Buy now! Free money!", "Click here for prizes!"])
-classifier.train(ham: ["Meeting tomorrow", "Please review the document"])
-
-# Classify new text
-classifier.classify "Claim your free prize!"
-# => "Spam"
-
-# Get calibrated probabilities
-classifier.probabilities "Claim your free prize!"
-# => {"Spam" => 0.92, "Ham" => 0.08}
-
-# See which words matter most
-classifier.weights(:spam, limit: 5)
-# => {:free => 2.3, :prize => 1.9, :claim => 1.5, ...}
+classifier.train(spam: "Buy cheap viagra now!", ham: "Meeting at 3pm tomorrow")
+classifier.classify "You've won a prize!"  # => "Spam"
 ```
+[Bayesian Guide →](https://rubyclassifier.com/docs/guides/bayes/basics)
 
-### Hyperparameters
+### Logistic Regression
 
 ```ruby
-classifier = Classifier::LogisticRegression.new(
-  :positive, :negative,
-  learning_rate: 0.1,    # SGD step size (default: 0.1)
-  regularization: 0.01,  # L2 penalty strength (default: 0.01)
-  max_iterations: 100,   # Training iterations (default: 100)
-  tolerance: 1e-4        # Convergence threshold (default: 1e-4)
-)
+classifier = Classifier::LogisticRegression.new(:positive, :negative)
+classifier.train(positive: "Great product!", negative: "Terrible experience")
+classifier.classify "Loved it!"  # => "Positive"
 ```
+[Logistic Regression Guide →](https://rubyclassifier.com/docs/guides/logistic-regression/basics)
 
-### When to Use Logistic Regression vs Bayes
-
-| Aspect | Logistic Regression | Naive Bayes |
-|--------|---------------------|-------------|
-| **Accuracy** | Often higher | Good baseline |
-| **Probabilities** | Well-calibrated | Tend to be extreme |
-| **Training** | Needs to fit model | Instant (just counting) |
-| **Interpretability** | Feature weights | Word frequencies |
-| **Small datasets** | Use higher regularization | Works well |
-
-Use Logistic Regression when you need accurate probabilities or want to understand feature importance. Use Bayes when you need instant updates or have very small training sets.
-
-## LSI (Latent Semantic Indexing)
-
-Semantic analysis using Singular Value Decomposition (SVD). More flexible than Bayesian classifiers, providing search, clustering, and classification based on meaning rather than just keywords.
-
-### Quick Start
+### LSI (Latent Semantic Indexing)
 
 ```ruby
-require 'classifier'
-
 lsi = Classifier::LSI.new
-
-# Add documents (category: item(s))
-lsi.add(pets: "Dogs are loyal pets that love to play fetch")
-lsi.add(pets: "Cats are independent and love to nap")
-lsi.add(programming: "Ruby is a dynamic programming language")
-
-# Add multiple items with the same category
-lsi.add(programming: ["Python is great for data science", "JavaScript runs in browsers"])
-
-# Batch operations with multiple categories
-lsi.add(
-  pets: ["Hamsters are small furry pets", "Birds can be great companions"],
-  programming: "Go is fast and concurrent"
-)
-
-# Classify new text
-lsi.classify "My puppy loves to run around"
-# => "Pets"
-
-# Get classification with confidence score
-lsi.classify_with_confidence "Learning to code in Ruby"
-# => ["Programming", 0.89]
+lsi.add(pets: "Dogs are loyal", tech: "Ruby is elegant")
+lsi.classify "My puppy is playful"  # => "pets"
 ```
+[LSI Guide →](https://rubyclassifier.com/docs/guides/lsi/basics)
 
-### Search and Discovery
-
-```ruby
-# Find similar documents
-lsi.find_related "Dogs are great companions", 2
-# => ["Dogs are loyal pets that love to play fetch", "Cats are independent..."]
-
-# Search by keyword
-lsi.search "programming", 3
-# => ["Ruby is a dynamic programming language", "Python is great for..."]
-```
-
-### Text Summarization
-
-LSI can extract key sentences from text:
-
-```ruby
-text = "First sentence about dogs. Second about cats. Third about birds."
-text.summary(2)  # Extract 2 most relevant sentences
-```
-
-For better sentence boundary detection (handles abbreviations like "Dr.", decimals, etc.), install the optional `pragmatic_segmenter` gem:
-
-```ruby
-gem 'pragmatic_segmenter'
-```
-
-### Learn More
-
-- [LSI Basics Guide](https://rubyclassifier.com/docs/guides/lsi/basics) - In-depth documentation
-- [Wikipedia: Latent Semantic Analysis](http://en.wikipedia.org/wiki/Latent_semantic_analysis)
-
-## k-Nearest Neighbors (kNN)
-
-Instance-based classification that stores examples and classifies by finding the most similar ones. No training phase required—just add examples and classify.
-
-### Key Features
-
-- **No Training Required**: Uses instance-based learning—store examples and classify by similarity
-- **Interpretable Results**: Returns neighbors that contributed to the decision
-- **Incremental Updates**: Easy to add or remove examples without retraining
-- **Distance-Weighted Voting**: Optional weighting by similarity score
-- **Built on LSI**: Leverages LSI's semantic similarity for better matching
-
-### Quick Start
+### k-Nearest Neighbors
 
 ```ruby
-require 'classifier'
-
 knn = Classifier::KNN.new(k: 3)
-
-# Add labeled examples
-knn.add(spam: ["Buy now! Limited offer!", "You've won a million dollars!"])
-knn.add(ham: ["Meeting at 3pm tomorrow", "Please review the document"])
-
-# Classify new text
-knn.classify "Congratulations! Claim your prize!"
-# => "spam"
-```
-
-### Detailed Classification
-
-Get neighbor information for interpretable results:
-
-```ruby
-result = knn.classify_with_neighbors "Free money offer"
-
-result[:category]    # => "spam"
-result[:confidence]  # => 0.85
-result[:neighbors]   # => [{item: "Buy now!...", category: "spam", similarity: 0.92}, ...]
-result[:votes]       # => {"spam" => 2.0, "ham" => 1.0}
+knn.train(spam: "Free money!", ham: "Quarterly report attached")  # or knn.add()
+knn.classify "Claim your prize"  # => "spam"
 ```
+[k-Nearest Neighbors Guide →](https://rubyclassifier.com/docs/guides/knn/basics)
 
-### Distance-Weighted Voting
-
-Weight votes by similarity score for more accurate classification:
+### TF-IDF
 
 ```ruby
-knn = Classifier::KNN.new(k: 5, weighted: true)
-
-knn.add(
-  positive: ["Great product!", "Loved it!", "Excellent service"],
-  negative: ["Terrible experience", "Would not recommend"]
-)
-
-# Closer neighbors have more influence on the result
-knn.classify "This was amazing!"
-# => "positive"
-```
-
-### Updating the Classifier
-
-```ruby
-# Add more examples anytime
-knn.add(neutral: "It was okay, nothing special")
-
-# Remove examples
-knn.remove_item "Buy now! Limited offer!"
-
-# Change k value
-knn.k = 7
-
-# List all categories
-knn.categories
-# => ["spam", "ham", "neutral"]
-```
-
-### When to Use kNN vs Bayes vs LSI
-
-| Classifier | Best For |
-|------------|----------|
-| **Bayes** | Fast classification, any training size (stores only word counts) |
-| **LSI** | Semantic similarity, document clustering, search |
-| **kNN** | <1000 examples, interpretable results, incremental updates |
-
-**Why the size difference?** Bayes stores aggregate statistics—adding 10,000 documents just increments counters. kNN stores every example and compares against all of them during classification, so performance degrades with size.
-
-## TF-IDF Vectorizer
-
-Transform text documents into TF-IDF (Term Frequency-Inverse Document Frequency) weighted feature vectors. TF-IDF downweights common words and upweights discriminative terms—the foundation for most classic text classification approaches.
-
-### Quick Start
-
-```ruby
-require 'classifier'
-
 tfidf = Classifier::TFIDF.new
-tfidf.fit(["Dogs are great pets", "Cats are independent", "Birds can fly"])
-
-# Transform text to TF-IDF vector (L2 normalized)
-vector = tfidf.transform("Dogs are loyal")
-# => {:dog=>0.7071..., :loyal=>0.7071...}
-
-# Fit and transform in one step
-vectors = tfidf.fit_transform(documents)
-```
-
-### Options
-
-```ruby
-tfidf = Classifier::TFIDF.new(
-  min_df: 2,           # Minimum document frequency (Integer or Float 0.0-1.0)
-  max_df: 0.95,        # Maximum document frequency (filters very common terms)
-  ngram_range: [1, 2], # Extract unigrams and bigrams
-  sublinear_tf: true   # Use 1 + log(tf) instead of raw term frequency
-)
+tfidf.fit(["Dogs are pets", "Cats are independent"])
+tfidf.transform("Dogs are loyal")  # => {:dog => 0.707, :loyal => 0.707}
 ```
+[TF-IDF Guide →](https://rubyclassifier.com/docs/guides/tfidf/basics)
 
-### Vocabulary Inspection
-
-```ruby
-tfidf.fit(documents)
+## Key Features
 
-tfidf.vocabulary      # => {:dog=>0, :cat=>1, :bird=>2, ...}
-tfidf.idf             # => {:dog=>1.405, :cat=>1.405, ...}
-tfidf.feature_names   # => [:dog, :cat, :bird, ...]
-tfidf.num_documents   # => 3
-tfidf.fitted?         # => true
-```
-
-### N-gram Support
-
-```ruby
-# Extract bigrams only
-tfidf = Classifier::TFIDF.new(ngram_range: [2, 2])
-tfidf.fit(["quick brown fox", "lazy brown dog"])
-tfidf.vocabulary.keys
-# => [:quick_brown, :brown_fox, :lazi_brown, :brown_dog]
-
-# Unigrams through trigrams
-tfidf = Classifier::TFIDF.new(ngram_range: [1, 3])
-```
+### Incremental LSI
 
-### Serialization
+Add documents without rebuilding the entire index—400x faster for streaming data:
 
 ```ruby
-# Save to JSON
-json = tfidf.to_json
-File.write("tfidf.json", json)
-
-# Load from JSON
-loaded = Classifier::TFIDF.from_json(File.read("tfidf.json"))
+lsi = Classifier::LSI.new(incremental: true)
+lsi.add(tech: ["Ruby is elegant", "Python is popular"])
+lsi.build_index
 
-# Or use Marshal
-data = Marshal.dump(tfidf)
-loaded = Marshal.load(data)
+# These use Brand's algorithm—no full rebuild
+lsi.add(tech: "Go is fast")
+lsi.add(tech: "Rust is safe")
 ```
 
-## Persistence
-
-Save and load classifiers with pluggable storage backends. Works with Bayes, LogisticRegression, LSI, and kNN classifiers.
+[Learn more →](https://rubyclassifier.com/docs/guides/lsi/incremental)
 
-### File Storage
+### Persistence
 
 ```ruby
-require 'classifier'
-
-classifier = Classifier::Bayes.new(:spam, :ham)
-classifier.train(spam: "Buy now! Limited offer!")
-classifier.train(ham: "Meeting tomorrow at 3pm")
-
-# Configure storage and save
-classifier.storage = Classifier::Storage::File.new(path: "spam_filter.json")
+classifier.storage = Classifier::Storage::File.new(path: "model.json")
 classifier.save
 
-# Load later
 loaded = Classifier::Bayes.load(storage: classifier.storage)
-loaded.classify "Claim your prize now!"
-# => "Spam"
 ```
 
-### Custom Storage Backends
+[Learn more →](https://rubyclassifier.com/docs/guides/persistence)
 
-Create backends for Redis, PostgreSQL, S3, or any storage system:
+### Streaming Training
 
 ```ruby
-class RedisStorage < Classifier::Storage::Base
-  def initialize(redis:, key:)
-    super()
-    @redis, @key = redis, key
-  end
-
-  def write(data) = @redis.set(@key, data)
-  def read = @redis.get(@key)
-  def delete = @redis.del(@key)
-  def exists? = @redis.exists?(@key)
-end
-
-# Use it
-classifier.storage = RedisStorage.new(redis: Redis.new, key: "classifier:spam")
-classifier.save
+classifier.train_from_stream(:spam, File.open("spam_corpus.txt"))
 ```
 
-### Learn More
-
-- [Persistence Guide](https://rubyclassifier.com/docs/guides/persistence/basics) - Full documentation with examples
+[Learn more →](https://rubyclassifier.com/docs/guides/streaming)
 
 ## Performance
 
-### Native C Extension vs Pure Ruby
-
-The native C extension provides dramatic speedups for LSI operations, especially `build_index` (SVD computation):
+Native C extension provides 5-50x speedup for LSI operations:
 
-| Documents | build_index | Overall |
-|-----------|-------------|---------|
-| 5         | 7x faster   | 2.6x    |
-| 10        | 25x faster  | 4.6x    |
-| 15        | 112x faster | 14.5x   |
-| 20        | 385x faster | 48.7x   |
-
-<details>
-<summary>Detailed benchmark (20 documents)</summary>
-
-```
-Operation            Pure Ruby     Native C      Speedup
-----------------------------------------------------------
-build_index            0.5540       0.0014       384.5x
-classify               0.0190       0.0060         3.2x
-search                 0.0145       0.0037         3.9x
-find_related           0.0098       0.0011         8.6x
-----------------------------------------------------------
-TOTAL                  0.5973       0.0123        48.7x
-```
-</details>
-
-### Running Benchmarks
+| Documents | Speedup |
+|-----------|---------|
+| 10 | 25x |
+| 20 | 50x |
 
 ```bash
-rake benchmark              # Run with current configuration
-rake benchmark:compare      # Compare native C vs pure Ruby
+rake benchmark:compare  # Run your own comparison
 ```
 
 ## Development
 
-### Setup
-
 ```bash
-git clone https://github.com/cardmagic/classifier.git
-cd classifier
 bundle install
-rake compile  # Compile native C extension
+rake compile  # Build native extension
+rake test     # Run tests
 ```
 
-### Running Tests
-
-```bash
-rake test                        # Run all tests (compiles first)
-ruby -Ilib test/bayes/bayesian_test.rb  # Run specific test file
-
-# Test with pure Ruby (no native extension)
-NATIVE_VECTOR=true rake test
-```
-
-### Console
-
-```bash
-rake console
-```
-
-## Contributing
-
-1. Fork the repository
-2. Create your feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit your changes (`git commit -am 'Add amazing feature'`)
-4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Open a Pull Request
-
 ## Authors
 
-- **Lucas Carlson** - *Original author* - lucas@rufy.com
-- **David Fayram II** - *LSI implementation* - dfayram@gmail.com
+- **Lucas Carlson** - lucas@rufy.com
+- **David Fayram II** - dfayram@gmail.com
 - **Cameron McBride** - cameron.mcbride@gmail.com
 - **Ivan Acosta-Rubio** - ivan@softwarecriollo.com
 
 ## License
 
-This library is released under the [GNU Lesser General Public License (LGPL) 2.1](LICENSE).
+[LGPL 2.1](LICENSE)
diff --git a/ext/classifier/classifier_ext.c b/ext/classifier/classifier_ext.c
index f6096933..45ebad40 100644
--- a/ext/classifier/classifier_ext.c
+++ b/ext/classifier/classifier_ext.c
@@ -22,4 +22,5 @@ void Init_classifier_ext(void)
     Init_vector();
     Init_matrix();
     Init_svd();
+    Init_incremental_svd();
 }
diff --git a/ext/classifier/incremental_svd.c b/ext/classifier/incremental_svd.c
new file mode 100644
index 00000000..ab390d98
--- /dev/null
+++ b/ext/classifier/incremental_svd.c
@@ -0,0 +1,393 @@
+/*
+ * incremental_svd.c
+ * Native C implementation of Brand's incremental SVD operations
+ *
+ * Provides fast matrix operations for:
+ * - Matrix column extension
+ * - Vertical stacking (vstack)
+ * - Vector subtraction
+ * - Batch document projection
+ */
+
+#include "linalg.h"
+
+/*
+ * Extend a matrix with a new column
+ * Returns a new matrix [M | col] with one additional column
+ */
+CMatrix *cmatrix_extend_column(CMatrix *m, CVector *col)
+{
+    if (m->rows != col->size) {
+        rb_raise(rb_eArgError,
+                 "Matrix rows (%ld) must match vector size (%ld)",
+                 (long)m->rows, (long)col->size);
+    }
+
+    CMatrix *result = cmatrix_alloc(m->rows, m->cols + 1);
+
+    for (size_t i = 0; i < m->rows; i++) {
+        memcpy(&MAT_AT(result, i, 0), &MAT_AT(m, i, 0), m->cols * sizeof(double));
+        MAT_AT(result, i, m->cols) = col->data[i];
+    }
+
+    return result;
+}
+
+/*
+ * Vertically stack two matrices
+ * Returns a new matrix [top; bottom]
+ */
+CMatrix *cmatrix_vstack(CMatrix *top, CMatrix *bottom)
+{
+    if (top->cols != bottom->cols) {
+        rb_raise(rb_eArgError,
+                 "Matrices must have same column count: %ld vs %ld",
+                 (long)top->cols, (long)bottom->cols);
+    }
+
+    size_t new_rows = top->rows + bottom->rows;
+    CMatrix *result = cmatrix_alloc(new_rows, top->cols);
+
+    memcpy(result->data, top->data, top->rows * top->cols * sizeof(double));
+    memcpy(result->data + top->rows * top->cols,
+           bottom->data,
+           bottom->rows * bottom->cols * sizeof(double));
+
+    return result;
+}
+
+/*
+ * Vector subtraction: a - b
+ */
+CVector *cvector_subtract(CVector *a, CVector *b)
+{
+    if (a->size != b->size) {
+        rb_raise(rb_eArgError,
+                 "Vector sizes must match: %ld vs %ld",
+                 (long)a->size, (long)b->size);
+    }
+
+    CVector *result = cvector_alloc(a->size);
+    for (size_t i = 0; i < a->size; i++) {
+        result->data[i] = a->data[i] - b->data[i];
+    }
+    return result;
+}
+
+/*
+ * Batch project multiple vectors onto U matrix
+ * Computes lsi_vector = U^T * raw_vector for each vector
+ * This is the most performance-critical operation for incremental updates
+ */
+void cbatch_project(CMatrix *u, CVector **raw_vectors, size_t num_vectors,
+                    CVector **lsi_vectors_out)
+{
+    size_t m = u->rows;    /* vocabulary size */
+    size_t k = u->cols;    /* rank */
+
+    for (size_t v = 0; v < num_vectors; v++) {
+        CVector *raw = raw_vectors[v];
+        if (raw->size != m) {
+            rb_raise(rb_eArgError,
+                     "Vector %ld size (%ld) must match matrix rows (%ld)",
+                     (long)v, (long)raw->size, (long)m);
+        }
+
+        CVector *lsi = cvector_alloc(k);
+
+        /* Compute U^T * raw (project onto k-dimensional space) */
+        for (size_t j = 0; j < k; j++) {
+            double sum = 0.0;
+            for (size_t i = 0; i < m; i++) {
+                sum += MAT_AT(u, i, j) * raw->data[i];
+            }
+            lsi->data[j] = sum;
+        }
+
+        lsi_vectors_out[v] = lsi;
+    }
+}
+
+/*
+ * Build the K matrix for Brand's algorithm when rank grows
+ * K = | diag(s)  m_vec |
+ *     |   0     p_norm |
+ */
+static CMatrix *build_k_matrix_with_growth(CVector *s, CVector *m_vec, double p_norm)
+{
+    size_t k = s->size;
+    CMatrix *result = cmatrix_alloc(k + 1, k + 1);
+
+    /* First k rows: diagonal s values and m_vec in last column */
+    for (size_t i = 0; i < k; i++) {
+        MAT_AT(result, i, i) = s->data[i];
+        MAT_AT(result, i, k) = m_vec->data[i];
+    }
+
+    /* Last row: zeros except p_norm in last position */
+    MAT_AT(result, k, k) = p_norm;
+
+    return result;
+}
+
+/*
+ * Perform one incremental SVD update using Brand's algorithm
+ *
+ * @param u Current U matrix (m x k)
+ * @param s Current singular values (k values)
+ * @param c New document vector (m x 1)
+ * @param max_rank Maximum rank to maintain
+ * @param epsilon Threshold for detecting new directions
+ * @param u_out Output: updated U matrix
+ * @param s_out Output: updated singular values
+ */
+static void incremental_update(CMatrix *u, CVector *s, CVector *c, int max_rank,
+                                double epsilon, CMatrix **u_out, CVector **s_out)
+{
+    size_t m = u->rows;
+    size_t k = u->cols;
+
+    /* Step 1: Project c onto column space of U */
+    /* m_vec = U^T * c */
+    CVector *m_vec = cvector_alloc(k);
+    for (size_t j = 0; j < k; j++) {
+        double sum = 0.0;
+        for (size_t i = 0; i < m; i++) {
+            sum += MAT_AT(u, i, j) * c->data[i];
+        }
+        m_vec->data[j] = sum;
+    }
+
+    /* Step 2: Compute residual p = c - U * m_vec */
+    CVector *u_times_m = cmatrix_multiply_vector(u, m_vec);
+    CVector *p = cvector_subtract(c, u_times_m);
+    double p_norm = cvector_magnitude(p);
+
+    cvector_free(u_times_m);
+
+    if (p_norm > epsilon) {
+        /* New direction found - rank may increase */
+
+        /* Step 3: Normalize residual */
+        CVector *p_hat = cvector_alloc(m);
+        double inv_p_norm = 1.0 / p_norm;
+        for (size_t i = 0; i < m; i++) {
+            p_hat->data[i] = p->data[i] * inv_p_norm;
+        }
+
+        /* Step 4: Build K matrix */
+        CMatrix *k_mat = build_k_matrix_with_growth(s, m_vec, p_norm);
+
+        /* Step 5: SVD of K matrix */
+        CMatrix *u_prime, *v_prime;
+        CVector *s_prime;
+        jacobi_svd(k_mat, &u_prime, &v_prime, &s_prime);
+        cmatrix_free(k_mat);
+        cmatrix_free(v_prime);
+
+        /* Step 6: Update U = [U | p_hat] * U' */
+        CMatrix *u_extended = cmatrix_extend_column(u, p_hat);
+        CMatrix *u_new = cmatrix_multiply(u_extended, u_prime);
+        cmatrix_free(u_extended);
+        cmatrix_free(u_prime);
+        cvector_free(p_hat);
+
+        /* Truncate if needed */
+        if (s_prime->size > (size_t)max_rank) {
+            /* Create truncated U (keep first max_rank columns) */
+            CMatrix *u_trunc = cmatrix_alloc(u_new->rows, (size_t)max_rank);
+            for (size_t i = 0; i < u_new->rows; i++) {
+                memcpy(&MAT_AT(u_trunc, i, 0), &MAT_AT(u_new, i, 0),
+                       (size_t)max_rank * sizeof(double));
+            }
+            cmatrix_free(u_new);
+            u_new = u_trunc;
+
+            /* Truncate singular values */
+            CVector *s_trunc = cvector_alloc((size_t)max_rank);
+            memcpy(s_trunc->data, s_prime->data, (size_t)max_rank * sizeof(double));
+            cvector_free(s_prime);
+            s_prime = s_trunc;
+        }
+
+        *u_out = u_new;
+        *s_out = s_prime;
+    } else {
+        /* Vector in span - use simpler update */
+        /* For now, just return unchanged (projection handles this) */
+        *u_out = cmatrix_alloc(u->rows, u->cols);
+        memcpy((*u_out)->data, u->data, u->rows * u->cols * sizeof(double));
+        *s_out = cvector_alloc(s->size);
+        memcpy((*s_out)->data, s->data, s->size * sizeof(double));
+    }
+
+    cvector_free(p);
+    cvector_free(m_vec);
+}
+
+/* ========== Ruby Wrappers ========== */
+
+/*
+ * Matrix.extend_column(matrix, vector)
+ * Returns [matrix | vector]
+ */
+static VALUE rb_cmatrix_extend_column(VALUE klass, VALUE rb_matrix, VALUE rb_vector)
+{
+    CMatrix *m;
+    CVector *v;
+
+    GET_CMATRIX(rb_matrix, m);
+    GET_CVECTOR(rb_vector, v);
+
+    CMatrix *result = cmatrix_extend_column(m, v);
+    return TypedData_Wrap_Struct(klass, &cmatrix_type, result);
+
+    (void)klass;
+}
+
+/*
+ * Matrix.vstack(top, bottom)
+ * Vertically stack two matrices
+ */
+static VALUE rb_cmatrix_vstack(VALUE klass, VALUE rb_top, VALUE rb_bottom)
+{
+    CMatrix *top, *bottom;
+
+    GET_CMATRIX(rb_top, top);
+    GET_CMATRIX(rb_bottom, bottom);
+
+    CMatrix *result = cmatrix_vstack(top, bottom);
+    return TypedData_Wrap_Struct(klass, &cmatrix_type, result);
+
+    (void)klass;
+}
+
+/*
+ * Matrix.zeros(rows, cols)
+ * Create a zero matrix
+ */
+static VALUE rb_cmatrix_zeros(VALUE klass, VALUE rb_rows, VALUE rb_cols)
+{
+    size_t rows = NUM2SIZET(rb_rows);
+    size_t cols = NUM2SIZET(rb_cols);
+
+    CMatrix *result = cmatrix_alloc(rows, cols);
+    return TypedData_Wrap_Struct(klass, &cmatrix_type, result);
+
+    (void)klass;
+}
+
+/*
+ * Vector#-(other)
+ * Vector subtraction
+ */
+static VALUE rb_cvector_subtract(VALUE self, VALUE other)
+{
+    CVector *a, *b;
+
+    GET_CVECTOR(self, a);
+
+    if (rb_obj_is_kind_of(other, cClassifierVector)) {
+        GET_CVECTOR(other, b);
+        CVector *result = cvector_subtract(a, b);
+        return TypedData_Wrap_Struct(cClassifierVector, &cvector_type, result);
+    }
+
+    rb_raise(rb_eTypeError, "Cannot subtract %s from Vector",
+             rb_obj_classname(other));
+    return Qnil;
+}
+
+/*
+ * Matrix#batch_project(vectors_array)
+ * Project multiple vectors onto this matrix (as U)
+ * Returns array of projected vectors
+ *
+ * This is the high-performance batch operation for re-projecting documents
+ */
+static VALUE rb_cmatrix_batch_project(VALUE self, VALUE rb_vectors)
+{
+    CMatrix *u;
+    GET_CMATRIX(self, u);
+
+    Check_Type(rb_vectors, T_ARRAY);
+    long num_vectors = RARRAY_LEN(rb_vectors);
+
+    if (num_vectors == 0) {
+        return rb_ary_new();
+    }
+
+    CVector **raw_vectors = ALLOC_N(CVector *, num_vectors);
+    for (long i = 0; i < num_vectors; i++) {
+        VALUE rb_vec = rb_ary_entry(rb_vectors, i);
+        if (!rb_obj_is_kind_of(rb_vec, cClassifierVector)) {
+            xfree(raw_vectors);
+            rb_raise(rb_eTypeError, "Expected array of Vectors");
+        }
+        GET_CVECTOR(rb_vec, raw_vectors[i]);
+    }
+
+    CVector **lsi_vectors = ALLOC_N(CVector *, num_vectors);
+    cbatch_project(u, raw_vectors, (size_t)num_vectors, lsi_vectors);
+
+    VALUE result = rb_ary_new_capa(num_vectors);
+    for (long i = 0; i < num_vectors; i++) {
+        VALUE rb_lsi = TypedData_Wrap_Struct(cClassifierVector, &cvector_type,
+                                              lsi_vectors[i]);
+        rb_ary_push(result, rb_lsi);
+    }
+
+    xfree(raw_vectors);
+    xfree(lsi_vectors);
+
+    return result;
+}
+
+/*
+ * Matrix#incremental_svd_update(singular_values, new_vector, max_rank, epsilon)
+ * Perform one Brand's incremental SVD update
+ * Returns [new_u, new_singular_values]
+ */
+static VALUE rb_cmatrix_incremental_update(VALUE self, VALUE rb_s, VALUE rb_c,
+                                            VALUE rb_max_rank, VALUE rb_epsilon)
+{
+    CMatrix *u;
+    CVector *s, *c;
+
+    GET_CMATRIX(self, u);
+    GET_CVECTOR(rb_s, s);
+    GET_CVECTOR(rb_c, c);
+
+    int max_rank = NUM2INT(rb_max_rank);
+    double epsilon = NUM2DBL(rb_epsilon);
+
+    CMatrix *u_new;
+    CVector *s_new;
+
+    incremental_update(u, s, c, max_rank, epsilon, &u_new, &s_new);
+
+    VALUE rb_u_new = TypedData_Wrap_Struct(cClassifierMatrix, &cmatrix_type, u_new);
+    VALUE rb_s_new = TypedData_Wrap_Struct(cClassifierVector, &cvector_type, s_new);
+
+    return rb_ary_new_from_args(2, rb_u_new, rb_s_new);
+}
+
+void Init_incremental_svd(void)
+{
+    /* Matrix class methods for incremental SVD */
+    rb_define_singleton_method(cClassifierMatrix, "extend_column",
+                               rb_cmatrix_extend_column, 2);
+    rb_define_singleton_method(cClassifierMatrix, "vstack",
+                               rb_cmatrix_vstack, 2);
+    rb_define_singleton_method(cClassifierMatrix, "zeros",
+                               rb_cmatrix_zeros, 2);
+
+    /* Instance methods */
+    rb_define_method(cClassifierMatrix, "batch_project",
+                     rb_cmatrix_batch_project, 1);
+    rb_define_method(cClassifierMatrix, "incremental_svd_update",
+                     rb_cmatrix_incremental_update, 4);
+
+    /* Vector subtraction */
+    rb_define_method(cClassifierVector, "-", rb_cvector_subtract, 1);
+}
diff --git a/ext/classifier/linalg.h b/ext/classifier/linalg.h
index 7220ee4f..71bbc30a 100644
--- a/ext/classifier/linalg.h
+++ b/ext/classifier/linalg.h
@@ -50,6 +50,14 @@ CMatrix *cmatrix_diagonal(CVector *v);
 void Init_svd(void);
 void jacobi_svd(CMatrix *a, CMatrix **u, CMatrix **v, CVector **s);
 
+/* Incremental SVD functions */
+void Init_incremental_svd(void);
+CMatrix *cmatrix_extend_column(CMatrix *m, CVector *col);
+CMatrix *cmatrix_vstack(CMatrix *top, CMatrix *bottom);
+CVector *cvector_subtract(CVector *a, CVector *b);
+void cbatch_project(CMatrix *u, CVector **raw_vectors, size_t num_vectors,
+                    CVector **lsi_vectors_out);
+
 /* TypedData definitions */
 extern const rb_data_type_t cvector_type;
 extern const rb_data_type_t cmatrix_type;
diff --git a/lib/classifier.rb b/lib/classifier.rb
index 306016e0..77cbd931 100644
--- a/lib/classifier.rb
+++ b/lib/classifier.rb
@@ -27,6 +27,7 @@
 require 'rubygems'
 require 'classifier/errors'
 require 'classifier/storage'
+require 'classifier/streaming'
 require 'classifier/extensions/string'
 require 'classifier/extensions/vector'
 require 'classifier/bayes'
diff --git a/lib/classifier/bayes.rb b/lib/classifier/bayes.rb
index 8b82283c..750aeeaf 100644
--- a/lib/classifier/bayes.rb
+++ b/lib/classifier/bayes.rb
@@ -8,8 +8,9 @@
 require 'mutex_m'
 
 module Classifier
-  class Bayes
+  class Bayes # rubocop:disable Metrics/ClassLength
     include Mutex_m
+    include Streaming
 
     # @rbs @categories: Hash[Symbol, Hash[Symbol, Integer]]
     # @rbs @total_words: Integer
@@ -310,8 +311,115 @@ def remove_category(category)
       end
     end
 
+    # Trains the classifier from an IO stream.
+    # Each line in the stream is treated as a separate document.
+    # This is memory-efficient for large corpora.
+    #
+    # @example Train from a file
+    #   classifier.train_from_stream(:spam, File.open('spam_corpus.txt'))
+    #
+    # @example With progress tracking
+    #   classifier.train_from_stream(:spam, io, batch_size: 500) do |progress|
+    #     puts "#{progress.completed} documents processed"
+    #   end
+    #
+    # @rbs (String | Symbol, IO, ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      category = category.prepare_category_name
+      raise StandardError, "No such category: #{category}" unless @categories.key?(category)
+
+      reader = Streaming::LineReader.new(io, batch_size: batch_size)
+      total = reader.estimate_line_count
+      progress = Streaming::Progress.new(total: total)
+
+      reader.each_batch do |batch|
+        train_batch_internal(category, batch)
+        progress.completed += batch.size
+        progress.current_batch += 1
+        yield progress if block_given?
+      end
+    end
+
+    # Trains the classifier with an array of documents in batches.
+    # Reduces lock contention by processing multiple documents per synchronize call.
+    #
+    # @example Positional style
+    #   classifier.train_batch(:spam, documents, batch_size: 100)
+    #
+    # @example Keyword style
+    #   classifier.train_batch(spam: documents, ham: other_docs, batch_size: 100)
+    #
+    # @example With progress tracking
+    #   classifier.train_batch(:spam, documents, batch_size: 100) do |progress|
+    #     puts "#{progress.percent}% complete"
+    #   end
+    #
+    # @rbs (?(String | Symbol)?, ?Array[String]?, ?batch_size: Integer, **Array[String]) { (Streaming::Progress) -> void } -> void
+    def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
+      if category && documents
+        train_batch_for_category(category, documents, batch_size: batch_size, &block)
+      else
+        categories.each do |cat, docs|
+          train_batch_for_category(cat, Array(docs), batch_size: batch_size, &block)
+        end
+      end
+    end
+
+    # Loads a classifier from a checkpoint.
+    #
+    # @rbs (storage: Storage::Base, checkpoint_id: String) -> Bayes
+    def self.load_checkpoint(storage:, checkpoint_id:)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      dir = File.dirname(storage.path)
+      base = File.basename(storage.path, '.*')
+      ext = File.extname(storage.path)
+      checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+
+      checkpoint_storage = Storage::File.new(path: checkpoint_path)
+      instance = load(storage: checkpoint_storage)
+      instance.storage = storage
+      instance
+    end
+
     private
 
+    # Trains a batch of documents for a single category.
+    # @rbs (String | Symbol, Array[String], ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_batch_for_category(category, documents, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      category = category.prepare_category_name
+      raise StandardError, "No such category: #{category}" unless @categories.key?(category)
+
+      progress = Streaming::Progress.new(total: documents.size)
+
+      documents.each_slice(batch_size) do |batch|
+        train_batch_internal(category, batch)
+        progress.completed += batch.size
+        progress.current_batch += 1
+        yield progress if block_given?
+      end
+    end
+
+    # Internal method to train a batch of documents.
+    # Uses a single synchronize block for the entire batch.
+    # @rbs (Symbol, Array[String]) -> void
+    def train_batch_internal(category, batch)
+      synchronize do
+        invalidate_caches
+        @dirty = true
+        batch.each do |text|
+          word_hash = text.word_hash
+          @category_counts[category] += 1
+          word_hash.each do |word, count|
+            @categories[category][word] ||= 0
+            @categories[category][word] += count
+            @total_words += count
+            @category_word_count[category] += count
+          end
+        end
+      end
+    end
+
     # Core training logic for a single category and text.
     # @rbs (String | Symbol, String) -> void
     def train_single(category, text)
diff --git a/lib/classifier/knn.rb b/lib/classifier/knn.rb
index d473abcb..37ae8174 100644
--- a/lib/classifier/knn.rb
+++ b/lib/classifier/knn.rb
@@ -19,6 +19,7 @@ module Classifier
   #
   class KNN
     include Mutex_m
+    include Streaming
 
     # @rbs @k: Integer
     # @rbs @weighted: bool
@@ -42,17 +43,25 @@ def initialize(k: 5, weighted: false) # rubocop:disable Naming/MethodParameterNa
     end
 
     # Adds labeled examples. Keys are categories, values are items or arrays.
+    # Also aliased as `train` for API consistency with Bayes and LogisticRegression.
+    #
+    #   knn.add(spam: "Buy now!", ham: "Meeting tomorrow")
+    #   knn.train(spam: "Buy now!", ham: "Meeting tomorrow")  # equivalent
+    #
     # @rbs (**untyped items) -> void
     def add(**items)
       synchronize { @dirty = true }
       @lsi.add(**items)
     end
 
+    alias train add
+
     # Classifies text using k nearest neighbors with majority voting.
-    # @rbs (String) -> (String | Symbol)?
+    # Returns the category as a String for API consistency with Bayes and LogisticRegression.
+    # @rbs (String) -> String?
     def classify(text)
       result = classify_with_neighbors(text)
-      result[:category]
+      result[:category]&.to_s
     end
 
     # Classifies and returns {category:, neighbors:, votes:, confidence:}.
@@ -91,10 +100,11 @@ def items
       @lsi.items
     end
 
-    # @rbs () -> Array[String | Symbol]
+    # Returns all unique categories as strings.
+    # @rbs () -> Array[String]
     def categories
       synchronize do
-        @lsi.items.flat_map { |item| @lsi.categories_for(item) }.uniq
+        @lsi.items.flat_map { |item| @lsi.categories_for(item) }.uniq.map(&:to_s)
       end
     end
 
@@ -104,6 +114,23 @@ def k=(value)
       @k = value
     end
 
+    # Provides dynamic training methods for categories.
+    # For example:
+    #   knn.train_spam "Buy now!"
+    #   knn.train_ham "Meeting tomorrow"
+    def method_missing(name, *args)
+      category_match = name.to_s.match(/\Atrain_(\w+)\z/)
+      return super unless category_match
+
+      category = category_match[1].to_sym
+      args.each { |text| add(category => text) }
+    end
+
+    # @rbs (Symbol, ?bool) -> bool
+    def respond_to_missing?(name, include_private = false)
+      !!(name.to_s =~ /\Atrain_(\w+)\z/) || super
+    end
+
     # @rbs (?untyped) -> untyped
     def as_json(_options = nil)
       {
@@ -213,6 +240,59 @@ def marshal_load(data)
       @storage = nil
     end
 
+    # Loads a classifier from a checkpoint.
+    #
+    # @rbs (storage: Storage::Base, checkpoint_id: String) -> KNN
+    def self.load_checkpoint(storage:, checkpoint_id:)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      dir = File.dirname(storage.path)
+      base = File.basename(storage.path, '.*')
+      ext = File.extname(storage.path)
+      checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+
+      checkpoint_storage = Storage::File.new(path: checkpoint_path)
+      instance = load(storage: checkpoint_storage)
+      instance.storage = storage
+      instance
+    end
+
+    # Trains the classifier from an IO stream.
+    # Each line in the stream is treated as a separate document.
+    #
+    # @example Train from a file
+    #   knn.train_from_stream(:spam, File.open('spam_corpus.txt'))
+    #
+    # @example With progress tracking
+    #   knn.train_from_stream(:spam, io, batch_size: 500) do |progress|
+    #     puts "#{progress.completed} documents processed"
+    #   end
+    #
+    # @rbs (String | Symbol, IO, ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE, &block)
+      @lsi.train_from_stream(category, io, batch_size: batch_size, &block)
+      synchronize { @dirty = true }
+    end
+
+    # Adds items in batches.
+    #
+    # @example Positional style
+    #   knn.train_batch(:spam, documents, batch_size: 100)
+    #
+    # @example Keyword style
+    #   knn.train_batch(spam: documents, ham: other_docs)
+    #
+    # @rbs (?(String | Symbol)?, ?Array[String]?, ?batch_size: Integer, **Array[String]) { (Streaming::Progress) -> void } -> void
+    def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
+      # @type var categories: Hash[Symbol, Array[String]]
+      @lsi.train_batch(category, documents, batch_size: batch_size, **categories, &block) # steep:ignore
+      synchronize { @dirty = true }
+    end
+
+    # @rbs!
+    #   alias add_batch train_batch
+    alias add_batch train_batch
+
     private
 
     # @rbs (String) -> Array[Hash[Symbol, untyped]]
diff --git a/lib/classifier/logistic_regression.rb b/lib/classifier/logistic_regression.rb
index a413c062..04e3f037 100644
--- a/lib/classifier/logistic_regression.rb
+++ b/lib/classifier/logistic_regression.rb
@@ -20,6 +20,7 @@ module Classifier
   #
   class LogisticRegression # rubocop:disable Metrics/ClassLength
     include Mutex_m
+    include Streaming
 
     # @rbs @categories: Array[Symbol]
     # @rbs @weights: Hash[Symbol, Hash[Symbol, Float]]
@@ -329,8 +330,112 @@ def marshal_load(data)
       @storage = nil
     end
 
+    # Loads a classifier from a checkpoint.
+    #
+    # @rbs (storage: Storage::Base, checkpoint_id: String) -> LogisticRegression
+    def self.load_checkpoint(storage:, checkpoint_id:)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      dir = File.dirname(storage.path)
+      base = File.basename(storage.path, '.*')
+      ext = File.extname(storage.path)
+      checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+
+      checkpoint_storage = Storage::File.new(path: checkpoint_path)
+      instance = load(storage: checkpoint_storage)
+      instance.storage = storage
+      instance
+    end
+
+    # Trains the classifier from an IO stream.
+    # Each line in the stream is treated as a separate document.
+    # Note: The model is NOT automatically fitted after streaming.
+    # Call #fit to train the model after adding all data.
+    #
+    # @example Train from a file
+    #   classifier.train_from_stream(:spam, File.open('spam_corpus.txt'))
+    #   classifier.fit  # Required to train the model
+    #
+    # @example With progress tracking
+    #   classifier.train_from_stream(:spam, io, batch_size: 500) do |progress|
+    #     puts "#{progress.completed} documents processed"
+    #   end
+    #   classifier.fit
+    #
+    # @rbs (String | Symbol, IO, ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      category = category.to_s.prepare_category_name
+      raise StandardError, "No such category: #{category}" unless @categories.include?(category)
+
+      reader = Streaming::LineReader.new(io, batch_size: batch_size)
+      total = reader.estimate_line_count
+      progress = Streaming::Progress.new(total: total)
+
+      reader.each_batch do |batch|
+        synchronize do
+          batch.each do |text|
+            features = text.word_hash
+            features.each_key { |word| @vocabulary[word] = true }
+            @training_data << { category: category, features: features }
+          end
+          @fitted = false
+          @dirty = true
+        end
+        progress.completed += batch.size
+        progress.current_batch += 1
+        yield progress if block_given?
+      end
+    end
+
+    # Trains the classifier with an array of documents in batches.
+    # Note: The model is NOT automatically fitted after batch training.
+    # Call #fit to train the model after adding all data.
+    #
+    # @example Positional style
+    #   classifier.train_batch(:spam, documents, batch_size: 100)
+    #   classifier.fit
+    #
+    # @example Keyword style
+    #   classifier.train_batch(spam: documents, ham: other_docs)
+    #   classifier.fit
+    #
+    # @rbs (?(String | Symbol)?, ?Array[String]?, ?batch_size: Integer, **Array[String]) { (Streaming::Progress) -> void } -> void
+    def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
+      if category && documents
+        train_batch_for_category(category, documents, batch_size: batch_size, &block)
+      else
+        categories.each do |cat, docs|
+          train_batch_for_category(cat, Array(docs), batch_size: batch_size, &block)
+        end
+      end
+    end
+
     private
 
+    # Trains a batch of documents for a single category.
+    # @rbs (String | Symbol, Array[String], ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_batch_for_category(category, documents, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      category = category.to_s.prepare_category_name
+      raise StandardError, "No such category: #{category}" unless @categories.include?(category)
+
+      progress = Streaming::Progress.new(total: documents.size)
+
+      documents.each_slice(batch_size) do |batch|
+        synchronize do
+          batch.each do |text|
+            features = text.word_hash
+            features.each_key { |word| @vocabulary[word] = true }
+            @training_data << { category: category, features: features }
+          end
+          @fitted = false
+          @dirty = true
+        end
+        progress.completed += batch.size
+        progress.current_batch += 1
+        yield progress if block_given?
+      end
+    end
+
     # Core training logic for a single category and text.
     # @rbs (String | Symbol, String) -> void
     def train_single(category, text)
diff --git a/lib/classifier/lsi.rb b/lib/classifier/lsi.rb
index 416a5cfe..6e1a6620 100644
--- a/lib/classifier/lsi.rb
+++ b/lib/classifier/lsi.rb
@@ -58,6 +58,7 @@ def matrix_class
 require 'classifier/lsi/word_list'
 require 'classifier/lsi/content_node'
 require 'classifier/lsi/summary'
+require 'classifier/lsi/incremental_svd'
 
 module Classifier
   # This class implements a Latent Semantic Indexer, which can search, classify and cluster
@@ -65,6 +66,7 @@ module Classifier
   # please consult Wikipedia[http://en.wikipedia.org/wiki/Latent_Semantic_Indexing].
   class LSI
     include Mutex_m
+    include Streaming
 
     # @rbs @auto_rebuild: bool
     # @rbs @word_list: WordList
@@ -74,14 +76,24 @@ class LSI
     # @rbs @singular_values: Array[Float]?
     # @rbs @dirty: bool
     # @rbs @storage: Storage::Base?
+    # @rbs @incremental_mode: bool
+    # @rbs @u_matrix: Matrix?
+    # @rbs @max_rank: Integer
+    # @rbs @initial_vocab_size: Integer?
 
     attr_reader :word_list, :singular_values
     attr_accessor :auto_rebuild, :storage
 
+    # Default maximum rank for incremental SVD
+    DEFAULT_MAX_RANK = 100
+
     # Create a fresh index.
     # If you want to call #build_index manually, use
     #      Classifier::LSI.new auto_rebuild: false
     #
+    # For incremental SVD mode (adds documents without full rebuild):
+    #      Classifier::LSI.new incremental: true, max_rank: 100
+    #
     # @rbs (?Hash[Symbol, untyped]) -> void
     def initialize(options = {})
       super()
@@ -92,6 +104,12 @@ def initialize(options = {})
       @built_at_version = -1
       @dirty = false
       @storage = nil
+
+      # Incremental SVD settings
+      @incremental_mode = options[:incremental] == true
+      @max_rank = options[:max_rank] || DEFAULT_MAX_RANK
+      @u_matrix = nil
+      @initial_vocab_size = nil
     end
 
     # Returns true if the index needs to be rebuilt.  The index needs
@@ -122,6 +140,40 @@ def singular_value_spectrum
       end
     end
 
+    # Returns true if incremental mode is enabled and active.
+    # Incremental mode becomes active after the first build_index call.
+    #
+    # @rbs () -> bool
+    def incremental_enabled?
+      @incremental_mode && !@u_matrix.nil?
+    end
+
+    # Returns the current rank of the incremental SVD (number of singular values kept).
+    # Returns nil if incremental mode is not active.
+    #
+    # @rbs () -> Integer?
+    def current_rank
+      @singular_values&.count(&:positive?)
+    end
+
+    # Disables incremental mode. Subsequent adds will trigger full rebuilds.
+    #
+    # @rbs () -> void
+    def disable_incremental_mode!
+      @incremental_mode = false
+      @u_matrix = nil
+      @initial_vocab_size = nil
+    end
+
+    # Enables incremental mode with optional max_rank setting.
+    # The next build_index call will store the U matrix for incremental updates.
+    #
+    # @rbs (?max_rank: Integer) -> void
+    def enable_incremental_mode!(max_rank: DEFAULT_MAX_RANK)
+      @incremental_mode = true
+      @max_rank = max_rank
+    end
+
     # Adds items to the index using hash-style syntax.
     # The hash keys are categories, and values are items (or arrays of items).
     #
@@ -165,11 +217,18 @@ def add(**items)
     # @rbs (String, *String | Symbol) ?{ (String) -> String } -> void
     def add_item(item, *categories, &block)
       clean_word_hash = block ? block.call(item).clean_word_hash : item.to_s.clean_word_hash
+      node = nil
+
       synchronize do
-        @items[item] = ContentNode.new(clean_word_hash, *categories)
+        node = ContentNode.new(clean_word_hash, *categories)
+        @items[item] = node
         @version += 1
         @dirty = true
       end
+
+      # Use incremental update if enabled and we have a U matrix
+      return perform_incremental_update(node, clean_word_hash) if @incremental_mode && @u_matrix
+
       build_index if @auto_rebuild
     end
 
@@ -230,12 +289,12 @@ def items
     # A value of 1 for cutoff means that no semantic analysis will take place,
     # turning the LSI class into a simple vector search engine.
     #
-    # @rbs (?Float) -> void
-    def build_index(cutoff = 0.75)
+    # @rbs (?Float, ?force: bool) -> void
+    def build_index(cutoff = 0.75, force: false)
       validate_cutoff!(cutoff)
 
       synchronize do
-        return unless needs_rebuild_unlocked?
+        return unless force || needs_rebuild_unlocked?
 
         make_word_list
 
@@ -246,14 +305,20 @@ def build_index(cutoff = 0.75)
           # Convert vectors to arrays for matrix construction
           tda_arrays = tda.map { |v| v.respond_to?(:to_a) ? v.to_a : v }
           tdm = self.class.matrix_class.alloc(*tda_arrays).trans
-          ntdm = build_reduced_matrix(tdm, cutoff)
+          ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
           assign_native_ext_lsi_vectors(ntdm, doc_list)
         else
           tdm = Matrix.rows(tda).trans
-          ntdm = build_reduced_matrix(tdm, cutoff)
+          ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
           assign_ruby_lsi_vectors(ntdm, doc_list)
         end
 
+        # Store U matrix for incremental mode
+        if @incremental_mode
+          @u_matrix = u_mat
+          @initial_vocab_size = @word_list.size
+        end
+
         @built_at_version = @version
       end
     end
@@ -559,6 +624,100 @@ def self.load_from_file(path)
       from_json(File.read(path))
     end
 
+    # Loads an LSI index from a checkpoint.
+    #
+    # @rbs (storage: Storage::Base, checkpoint_id: String) -> LSI
+    def self.load_checkpoint(storage:, checkpoint_id:)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      dir = File.dirname(storage.path)
+      base = File.basename(storage.path, '.*')
+      ext = File.extname(storage.path)
+      checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+
+      checkpoint_storage = Storage::File.new(path: checkpoint_path)
+      instance = load(storage: checkpoint_storage)
+      instance.storage = storage
+      instance
+    end
+
+    # Trains the LSI index from an IO stream.
+    # Each line in the stream is treated as a separate document.
+    # Documents are added without rebuilding, then the index is rebuilt at the end.
+    #
+    # @example Train from a file
+    #   lsi.train_from_stream(:category, File.open('corpus.txt'))
+    #
+    # @example With progress tracking
+    #   lsi.train_from_stream(:category, io, batch_size: 500) do |progress|
+    #     puts "#{progress.completed} documents processed"
+    #   end
+    #
+    # @rbs (String | Symbol, IO, ?batch_size: Integer) { (Streaming::Progress) -> void } -> void
+    def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      original_auto_rebuild = @auto_rebuild
+      @auto_rebuild = false
+
+      begin
+        reader = Streaming::LineReader.new(io, batch_size: batch_size)
+        total = reader.estimate_line_count
+        progress = Streaming::Progress.new(total: total)
+
+        reader.each_batch do |batch|
+          batch.each { |text| add_item(text, category) }
+          progress.completed += batch.size
+          progress.current_batch += 1
+          yield progress if block_given?
+        end
+      ensure
+        @auto_rebuild = original_auto_rebuild
+        build_index if original_auto_rebuild
+      end
+    end
+
+    # Adds items to the index in batches from an array.
+    # Documents are added without rebuilding, then the index is rebuilt at the end.
+    #
+    # @example Batch add with progress
+    #   lsi.add_batch(Dog: documents, batch_size: 100) do |progress|
+    #     puts "#{progress.percent}% complete"
+    #   end
+    #
+    # @rbs (?batch_size: Integer, **Array[String]) { (Streaming::Progress) -> void } -> void
+    def add_batch(batch_size: Streaming::DEFAULT_BATCH_SIZE, **items)
+      original_auto_rebuild = @auto_rebuild
+      @auto_rebuild = false
+
+      begin
+        total_docs = items.values.sum { |v| Array(v).size }
+        progress = Streaming::Progress.new(total: total_docs)
+
+        items.each do |category, documents|
+          Array(documents).each_slice(batch_size) do |batch|
+            batch.each { |doc| add_item(doc, category.to_s) }
+            progress.completed += batch.size
+            progress.current_batch += 1
+            yield progress if block_given?
+          end
+        end
+      ensure
+        @auto_rebuild = original_auto_rebuild
+        build_index if original_auto_rebuild
+      end
+    end
+
+    # Alias train_batch to add_batch for API consistency with other classifiers.
+    # Note: LSI uses categories differently (items have categories, not the training call).
+    #
+    # @rbs (?(String | Symbol)?, ?Array[String]?, ?batch_size: Integer, **Array[String]) { (Streaming::Progress) -> void } -> void
+    def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
+      if category && documents
+        add_batch(batch_size: batch_size, **{ category.to_sym => documents }, &block)
+      else
+        add_batch(batch_size: batch_size, **categories, &block)
+      end
+    end
+
     private
 
     # Restores LSI state from a JSON string (used by reload)
@@ -679,7 +838,7 @@ def vote_unlocked(doc, cutoff = 0.30, &)
       votes
     end
 
-    # Unlocked version of node_for_content for internal use
+    # Unlocked version of node_for_content for internal use.
     # @rbs (String) ?{ (String) -> String } -> ContentNode
     def node_for_content_unlocked(item, &block)
       return @items[item] if @items[item]
@@ -687,30 +846,67 @@ def node_for_content_unlocked(item, &block)
       clean_word_hash = block ? block.call(item).clean_word_hash : item.to_s.clean_word_hash
       cn = ContentNode.new(clean_word_hash, &block)
       cn.raw_vector_with(@word_list) unless needs_rebuild_unlocked?
+      assign_lsi_vector_incremental(cn) if incremental_enabled?
       cn
     end
 
     # @rbs (untyped, ?Float) -> untyped
     def build_reduced_matrix(matrix, cutoff = 0.75)
-      # TODO: Check that M>=N on these dimensions! Transpose helps assure this
-      u, v, s = matrix.SV_decomp
+      result, _u = build_reduced_matrix_with_u(matrix, cutoff)
+      result
+    end
 
-      @singular_values = s.sort.reverse
+    # Builds reduced matrix and returns both the result and the U matrix.
+    # U matrix is needed for incremental SVD updates.
+    # @rbs (untyped, ?Float) -> [untyped, Matrix]
+    def build_reduced_matrix_with_u(matrix, cutoff = 0.75)
+      u, v, s = matrix.SV_decomp
 
+      all_singular_values = s.sort.reverse
       s_cutoff_index = [(s.size * cutoff).round - 1, 0].max
-      s_cutoff = @singular_values[s_cutoff_index]
+      s_cutoff = all_singular_values[s_cutoff_index]
+
+      kept_indices = []
+      kept_singular_values = []
       s.size.times do |ord|
-        s[ord] = 0.0 if s[ord] < s_cutoff
+        if s[ord] >= s_cutoff
+          kept_indices << ord
+          kept_singular_values << s[ord]
+        else
+          s[ord] = 0.0
+        end
       end
-      # Reconstruct the term document matrix, only with reduced rank
-      result = u * self.class.matrix_class.diag(s) * v.trans
 
-      # SVD may return transposed dimensions when row_size < column_size
-      # Ensure result matches input dimensions
+      @singular_values = kept_singular_values.sort.reverse
+      result = u * self.class.matrix_class.diag(s) * v.trans
       result = result.trans if result.row_size != matrix.row_size
+      u_reduced = extract_reduced_u(u, kept_indices, s)
 
-      result
+      [result, u_reduced]
+    end
+
+    # Extracts columns from U corresponding to kept singular values.
+    # Columns are sorted by descending singular value to match @singular_values order.
+    # rubocop:disable Naming/MethodParameterName
+    # @rbs (untyped, Array[Integer], Array[Float]) -> Matrix
+    def extract_reduced_u(u, kept_indices, singular_values)
+      return Matrix.empty(u.row_size, 0) if kept_indices.empty?
+
+      sorted_indices = kept_indices.sort_by { |i| -singular_values[i] }
+
+      if u.respond_to?(:to_ruby_matrix)
+        u = u.to_ruby_matrix
+      elsif !u.is_a?(::Matrix)
+        rows = u.row_size.times.map do |i|
+          sorted_indices.map { |j| u[i, j] }
+        end
+        return Matrix.rows(rows)
+      end
+
+      cols = sorted_indices.map { |i| u.column(i).to_a }
+      Matrix.columns(cols)
     end
+    # rubocop:enable Naming/MethodParameterName
 
     # @rbs () -> void
     def make_word_list
@@ -719,5 +915,129 @@ def make_word_list
         node.word_hash.each_key { |key| @word_list.add_word key }
       end
     end
+
+    # Performs incremental SVD update for a new document.
+    # @rbs (ContentNode, Hash[Symbol, Integer]) -> void
+    def perform_incremental_update(node, word_hash)
+      needs_full_rebuild = false
+      old_rank = nil
+
+      synchronize do
+        if vocabulary_growth_exceeds_threshold?(word_hash)
+          disable_incremental_mode!
+          needs_full_rebuild = true
+          next
+        end
+
+        old_rank = @u_matrix.column_size
+        extend_vocabulary_for_incremental(word_hash)
+        raw_vec = node.raw_vector_with(@word_list)
+        raw_vector = Vector[*raw_vec.to_a]
+
+        @u_matrix, @singular_values = IncrementalSVD.update(
+          @u_matrix, @singular_values, raw_vector, max_rank: @max_rank
+        )
+
+        new_rank = @u_matrix.column_size
+        if new_rank > old_rank
+          reproject_all_documents
+        else
+          assign_lsi_vector_incremental(node)
+        end
+
+        @built_at_version = @version
+      end
+
+      build_index if needs_full_rebuild
+    end
+
+    # Checks if vocabulary growth would exceed threshold (20%)
+    # @rbs (Hash[Symbol, Integer]) -> bool
+    def vocabulary_growth_exceeds_threshold?(word_hash)
+      return false unless @initial_vocab_size&.positive?
+
+      new_words = word_hash.keys.count { |w| @word_list[w].nil? }
+      growth_ratio = new_words.to_f / @initial_vocab_size
+      growth_ratio > 0.2
+    end
+
+    # Extends vocabulary and U matrix for new words.
+    # @rbs (Hash[Symbol, Integer]) -> void
+    def extend_vocabulary_for_incremental(word_hash)
+      new_words = word_hash.keys.select { |w| @word_list[w].nil? }
+      return if new_words.empty?
+
+      new_words.each { |word| @word_list.add_word(word) }
+      extend_u_matrix(new_words.size)
+    end
+
+    # Extends U matrix with zero rows for new vocabulary terms.
+    # @rbs (Integer) -> void
+    def extend_u_matrix(num_new_rows)
+      return if num_new_rows.zero? || @u_matrix.nil?
+
+      if self.class.native_available? && @u_matrix.is_a?(self.class.matrix_class)
+        new_rows = self.class.matrix_class.zeros(num_new_rows, @u_matrix.column_size)
+        @u_matrix = self.class.matrix_class.vstack(@u_matrix, new_rows)
+      else
+        new_rows = Matrix.zero(num_new_rows, @u_matrix.column_size)
+        @u_matrix = Matrix.vstack(@u_matrix, new_rows)
+      end
+    end
+
+    # Re-projects all documents onto the current U matrix
+    # Called when rank grows to ensure consistent LSI vector sizes
+    # Uses native batch_project for performance when available
+    # @rbs () -> void
+    def reproject_all_documents
+      return unless @u_matrix
+      return reproject_all_documents_native if self.class.native_available? && @u_matrix.respond_to?(:batch_project)
+
+      reproject_all_documents_ruby
+    end
+
+    # Native batch re-projection using C extension.
+    # @rbs () -> void
+    def reproject_all_documents_native
+      nodes = @items.values
+      raw_vectors = nodes.map do |node|
+        raw = node.raw_vector_with(@word_list)
+        raw.is_a?(self.class.vector_class) ? raw : self.class.vector_class.alloc(raw.to_a)
+      end
+
+      lsi_vectors = @u_matrix.batch_project(raw_vectors)
+
+      nodes.each_with_index do |node, i|
+        lsi_vec = lsi_vectors[i].row
+        node.lsi_vector = lsi_vec
+        node.lsi_norm = lsi_vec.normalize
+      end
+    end
+
+    # Pure Ruby re-projection (fallback)
+    # @rbs () -> void
+    def reproject_all_documents_ruby
+      @items.each_value do |node|
+        assign_lsi_vector_incremental(node)
+      end
+    end
+
+    # Assigns LSI vector to a node using projection: lsi_vec = U^T * raw_vec.
+    # @rbs (ContentNode) -> void
+    def assign_lsi_vector_incremental(node)
+      return unless @u_matrix
+
+      raw_vec = node.raw_vector_with(@word_list)
+      raw_vector = Vector[*raw_vec.to_a]
+      lsi_arr = (@u_matrix.transpose * raw_vector).to_a
+
+      lsi_vec = if self.class.native_available?
+                  self.class.vector_class.alloc(lsi_arr).row
+                else
+                  Vector[*lsi_arr]
+                end
+      node.lsi_vector = lsi_vec
+      node.lsi_norm = lsi_vec.normalize
+    end
   end
 end
diff --git a/lib/classifier/lsi/incremental_svd.rb b/lib/classifier/lsi/incremental_svd.rb
new file mode 100644
index 00000000..beabfa37
--- /dev/null
+++ b/lib/classifier/lsi/incremental_svd.rb
@@ -0,0 +1,166 @@
+# rbs_inline: enabled
+
+# rubocop:disable Naming/MethodParameterName, Metrics/ParameterLists
+
+require 'matrix'
+
+module Classifier
+  class LSI
+    # Brand's Incremental SVD Algorithm for LSI
+    #
+    # Implements the algorithm from Brand (2006) "Fast low-rank modifications
+    # of the thin singular value decomposition" for adding documents to LSI
+    # without full SVD recomputation.
+    #
+    # Given existing thin SVD: A ≈ U * S * V^T (with k components)
+    # When adding a new column c:
+    #
+    # 1. Project: m = U^T * c (project onto existing column space)
+    # 2. Residual: p = c - U * m (component orthogonal to U)
+    # 3. Orthonormalize: If ||p|| > ε: p̂ = p / ||p||
+    # 4. Form K matrix:
+    #    - If ||p|| > ε: K = [diag(s), m; 0, ||p||] (rank grows by 1)
+    #    - If ||p|| ≈ 0: K = diag(s) + m * e_last^T (no new direction)
+    # 5. Small SVD: Compute SVD of K (only (k+1) × (k+1) matrix!)
+    # 6. Update:
+    #    - U_new = [U, p̂] * U'
+    #    - S_new = S'
+    #
+    module IncrementalSVD
+      EPSILON = 1e-10
+
+      class << self
+        # Updates SVD with a new document vector using Brand's algorithm.
+        #
+        # @param u [Matrix] current left singular vectors (m × k)
+        # @param s [Array<Float>] current singular values (k values)
+        # @param c [Vector] new document vector (m × 1)
+        # @param max_rank [Integer] maximum rank to maintain
+        # @param epsilon [Float] threshold for zero detection
+        # @return [Array<Matrix, Array<Float>>] updated [u, s]
+        #
+        # @rbs (Matrix, Array[Float], Vector, max_rank: Integer, ?epsilon: Float) -> [Matrix, Array[Float]]
+        def update(u, s, c, max_rank:, epsilon: EPSILON)
+          m_vec = project(u, c)
+          u_times_m = u * m_vec
+          p_vec = c - (u_times_m.is_a?(Vector) ? u_times_m : Vector[*u_times_m.to_a.flatten])
+          p_norm = magnitude(p_vec)
+
+          if p_norm > epsilon
+            update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon)
+          else
+            update_in_span(u, s, m_vec, max_rank, epsilon)
+          end
+        end
+
+        # Projects a document vector onto the semantic space defined by U.
+        # Returns the LSI representation: lsi_vec = U^T * raw_vec
+        #
+        # @param u [Matrix] left singular vectors (m × k)
+        # @param raw_vec [Vector] document vector in term space (m × 1)
+        # @return [Vector] document in semantic space (k × 1)
+        #
+        # @rbs (Matrix, Vector) -> Vector
+        def project(u, raw_vec)
+          u.transpose * raw_vec
+        end
+
+        private
+
+        # Update when new document has a component orthogonal to existing U.
+        # @rbs (Matrix, Array[Float], Vector, Vector, Float, Integer, Float) -> [Matrix, Array[Float]]
+        def update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon)
+          p_hat = p_vec * (1.0 / p_norm)
+          k_matrix = build_k_matrix_with_growth(s, m_vec, p_norm)
+          u_prime, s_prime = small_svd(k_matrix, epsilon)
+          u_extended = extend_matrix_with_column(u, p_hat)
+          u_new = u_extended * u_prime
+
+          u_new, s_prime = truncate(u_new, s_prime, max_rank) if s_prime.size > max_rank
+
+          [u_new, s_prime]
+        end
+
+        # Update when new document is entirely within span of existing U.
+        # @rbs (Matrix, Array[Float], Vector, Integer, Float) -> [Matrix, Array[Float]]
+        def update_in_span(u, s, m_vec, max_rank, epsilon)
+          k_matrix = build_k_matrix_in_span(s, m_vec)
+          u_prime, s_prime = small_svd(k_matrix, epsilon)
+          u_new = u * u_prime
+
+          u_new, s_prime = truncate(u_new, s_prime, max_rank) if s_prime.size > max_rank
+
+          [u_new, s_prime]
+        end
+
+        # Builds the K matrix when rank grows by 1.
+        # @rbs (Array[Float], untyped, Float) -> untyped
+        def build_k_matrix_with_growth(s, m_vec, p_norm)
+          k = s.size
+          rows = k.times.map do |i|
+            row = Array.new(k + 1, 0.0) #: Array[Float]
+            row[i] = s[i].to_f
+            row[k] = m_vec[i].to_f
+            row
+          end
+          rows << Array.new(k + 1, 0.0).tap { |r| r[k] = p_norm }
+          Matrix.rows(rows)
+        end
+
+        # Builds the K matrix when vector is in span (no rank growth).
+        # @rbs (Array[Float], Vector) -> Matrix
+        def build_k_matrix_in_span(s, _m_vec)
+          k = s.size
+          rows = k.times.map do |i|
+            row = Array.new(k, 0.0)
+            row[i] = s[i]
+            row
+          end
+          Matrix.rows(rows)
+        end
+
+        # Computes SVD of small matrix and extracts singular values.
+        # @rbs (Matrix, Float) -> [Matrix, Array[Float]]
+        def small_svd(matrix, epsilon)
+          u, _v, s_array = matrix.SV_decomp
+
+          s_sorted = s_array.select { |sv| sv.abs > epsilon }.sort.reverse
+          indices = s_array.each_with_index
+                           .select { |sv, _| sv.abs > epsilon }
+                           .sort_by { |sv, _| -sv }
+                           .map { |_, i| i }
+
+          u_cols = indices.map { |i| u.column(i).to_a }
+          u_reordered = u_cols.empty? ? Matrix.empty(matrix.row_size, 0) : Matrix.columns(u_cols)
+
+          [u_reordered, s_sorted]
+        end
+
+        # Extends matrix with a new column
+        # @rbs (Matrix, Vector) -> Matrix
+        def extend_matrix_with_column(matrix, col_vec)
+          rows = matrix.row_size.times.map do |i|
+            matrix.row(i).to_a + [col_vec[i]]
+          end
+          Matrix.rows(rows)
+        end
+
+        # Truncates to max_rank
+        # @rbs (untyped, Array[Float], Integer) -> [untyped, Array[Float]]
+        def truncate(u, s, max_rank)
+          s_truncated = s[0...max_rank] || [] #: Array[Float]
+          cols = (0...max_rank).map { |i| u.column(i).to_a }
+          u_truncated = Matrix.columns(cols)
+          [u_truncated, s_truncated]
+        end
+
+        # Computes magnitude of a vector
+        # @rbs (untyped) -> Float
+        def magnitude(vec)
+          Math.sqrt(vec.to_a.sum { |x| x.to_f * x.to_f })
+        end
+      end
+    end
+  end
+end
+# rubocop:enable Naming/MethodParameterName, Metrics/ParameterLists
diff --git a/lib/classifier/streaming.rb b/lib/classifier/streaming.rb
new file mode 100644
index 00000000..3c228b41
--- /dev/null
+++ b/lib/classifier/streaming.rb
@@ -0,0 +1,122 @@
+# rbs_inline: enabled
+
+require_relative 'streaming/progress'
+require_relative 'streaming/line_reader'
+
+module Classifier
+  # Streaming module provides memory-efficient training capabilities for classifiers.
+  # Include this module in a classifier to add streaming and batch training methods.
+  #
+  # @example Including in a classifier
+  #   class MyClassifier
+  #     include Classifier::Streaming
+  #   end
+  #
+  # @example Streaming training
+  #   classifier.train_from_stream(:category, File.open('corpus.txt'))
+  #
+  # @example Batch training with progress
+  #   classifier.train_batch(:category, documents, batch_size: 100) do |progress|
+  #     puts "#{progress.percent}% complete"
+  #   end
+  module Streaming
+    # Default batch size for streaming operations
+    DEFAULT_BATCH_SIZE = 100
+
+    # Trains the classifier from an IO stream.
+    # Each line in the stream is treated as a separate document.
+    #
+    # @rbs (Symbol | String, IO, ?batch_size: Integer) { (Progress) -> void } -> void
+    def train_from_stream(category, io, batch_size: DEFAULT_BATCH_SIZE, &block)
+      raise NotImplementedError, "#{self.class} must implement train_from_stream"
+    end
+
+    # Trains the classifier with an array of documents in batches.
+    # Supports both positional and keyword argument styles.
+    #
+    # @example Positional style
+    #   classifier.train_batch(:spam, documents, batch_size: 100)
+    #
+    # @example Keyword style
+    #   classifier.train_batch(spam: documents, ham: other_docs, batch_size: 100)
+    #
+    # @rbs (?(Symbol | String)?, ?Array[String]?, ?batch_size: Integer, **Array[String]) { (Progress) -> void } -> void
+    def train_batch(category = nil, documents = nil, batch_size: DEFAULT_BATCH_SIZE, **categories, &block)
+      raise NotImplementedError, "#{self.class} must implement train_batch"
+    end
+
+    # Saves a checkpoint of the current training state.
+    # Requires a storage backend to be configured.
+    #
+    # @rbs (String) -> void
+    def save_checkpoint(checkpoint_id)
+      raise ArgumentError, 'No storage configured' unless respond_to?(:storage) && storage
+
+      original_storage = storage
+
+      begin
+        self.storage = checkpoint_storage_for(checkpoint_id)
+        save
+      ensure
+        self.storage = original_storage
+      end
+    end
+
+    # Lists available checkpoints.
+    # Requires a storage backend to be configured.
+    #
+    # @rbs () -> Array[String]
+    def list_checkpoints
+      raise ArgumentError, 'No storage configured' unless respond_to?(:storage) && storage
+
+      case storage
+      when Storage::File
+        file_storage = storage #: Storage::File
+        dir = File.dirname(file_storage.path)
+        base = File.basename(file_storage.path, '.*')
+        ext = File.extname(file_storage.path)
+
+        pattern = File.join(dir, "#{base}_checkpoint_*#{ext}")
+        Dir.glob(pattern).map do |path|
+          File.basename(path, ext).sub(/^#{Regexp.escape(base)}_checkpoint_/, '')
+        end.sort
+      else
+        []
+      end
+    end
+
+    # Deletes a checkpoint.
+    #
+    # @rbs (String) -> void
+    def delete_checkpoint(checkpoint_id)
+      raise ArgumentError, 'No storage configured' unless respond_to?(:storage) && storage
+
+      checkpoint_storage = checkpoint_storage_for(checkpoint_id)
+      checkpoint_storage.delete if checkpoint_storage.exists?
+    end
+
+    private
+
+    # @rbs (String) -> String
+    def checkpoint_path_for(checkpoint_id)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      file_storage = storage #: Storage::File
+      dir = File.dirname(file_storage.path)
+      base = File.basename(file_storage.path, '.*')
+      ext = File.extname(file_storage.path)
+
+      File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+    end
+
+    # @rbs (String) -> Storage::Base
+    def checkpoint_storage_for(checkpoint_id)
+      case storage
+      when Storage::File
+        Storage::File.new(path: checkpoint_path_for(checkpoint_id))
+      else
+        raise ArgumentError, "Checkpoints not supported for #{storage.class}"
+      end
+    end
+  end
+end
diff --git a/lib/classifier/streaming/line_reader.rb b/lib/classifier/streaming/line_reader.rb
new file mode 100644
index 00000000..e0b92f31
--- /dev/null
+++ b/lib/classifier/streaming/line_reader.rb
@@ -0,0 +1,99 @@
+# rbs_inline: enabled
+
+module Classifier
+  module Streaming
+    # Memory-efficient line reader for large files and IO streams.
+    # Reads lines one at a time and can yield in configurable batches.
+    #
+    # @example Reading line by line
+    #   reader = LineReader.new(File.open('large_corpus.txt'))
+    #   reader.each { |line| process(line) }
+    #
+    # @example Reading in batches
+    #   reader = LineReader.new(io, batch_size: 100)
+    #   reader.each_batch { |batch| process_batch(batch) }
+    class LineReader
+      include Enumerable #[String]
+
+      # @rbs @io: IO
+      # @rbs @batch_size: Integer
+
+      attr_reader :batch_size
+
+      # Creates a new LineReader.
+      #
+      # @rbs (IO, ?batch_size: Integer) -> void
+      def initialize(io, batch_size: 100)
+        @io = io
+        @batch_size = batch_size
+      end
+
+      # Iterates over each line in the IO stream.
+      # Lines are chomped (trailing newlines removed).
+      #
+      # @rbs () { (String) -> void } -> void
+      # @rbs () -> Enumerator[String, void]
+      def each
+        return enum_for(:each) unless block_given?
+
+        @io.each_line do |line|
+          yield line.chomp
+        end
+      end
+
+      # Iterates over batches of lines.
+      # Each batch is an array of chomped lines.
+      #
+      # @rbs () { (Array[String]) -> void } -> void
+      # @rbs () -> Enumerator[Array[String], void]
+      def each_batch
+        return enum_for(:each_batch) unless block_given?
+
+        batch = [] #: Array[String]
+        each do |line|
+          batch << line
+          if batch.size >= @batch_size
+            yield batch
+            batch = []
+          end
+        end
+        yield batch unless batch.empty?
+      end
+
+      # Estimates the total number of lines in the IO stream.
+      # This is a rough estimate based on file size and average line length.
+      # Returns nil for non-seekable streams.
+      #
+      # @rbs (?sample_size: Integer) -> Integer?
+      def estimate_line_count(sample_size: 100)
+        return nil unless @io.respond_to?(:size) && @io.respond_to?(:rewind)
+
+        begin
+          original_pos = @io.pos
+          @io.rewind
+
+          sample_bytes = 0
+          sample_lines = 0
+
+          sample_size.times do
+            line = @io.gets
+            break unless line
+
+            sample_bytes += line.bytesize
+            sample_lines += 1
+          end
+
+          @io.seek(original_pos)
+
+          return nil if sample_lines.zero?
+
+          avg_line_size = sample_bytes.to_f / sample_lines
+          io_size = @io.__send__(:size) #: Integer
+          (io_size / avg_line_size).round
+        rescue IOError, Errno::ESPIPE
+          nil
+        end
+      end
+    end
+  end
+end
diff --git a/lib/classifier/streaming/progress.rb b/lib/classifier/streaming/progress.rb
new file mode 100644
index 00000000..d747674a
--- /dev/null
+++ b/lib/classifier/streaming/progress.rb
@@ -0,0 +1,96 @@
+# rbs_inline: enabled
+
+module Classifier
+  module Streaming
+    # Progress tracking object yielded to blocks during batch/stream operations.
+    # Provides information about training progress including completion percentage,
+    # elapsed time, processing rate, and estimated time remaining.
+    #
+    # @example Basic usage with train_batch
+    #   classifier.train_batch(:spam, documents, batch_size: 100) do |progress|
+    #     puts "#{progress.completed}/#{progress.total} (#{progress.percent}%)"
+    #     puts "Rate: #{progress.rate.round(1)} docs/sec"
+    #     puts "ETA: #{progress.eta&.round}s" if progress.eta
+    #   end
+    class Progress
+      # @rbs @completed: Integer
+      # @rbs @total: Integer?
+      # @rbs @start_time: Time
+      # @rbs @current_batch: Integer
+
+      attr_reader :start_time, :total
+      attr_accessor :completed, :current_batch
+
+      # @rbs (?total: Integer?, ?completed: Integer) -> void
+      def initialize(total: nil, completed: 0)
+        @completed = completed
+        @total = total
+        @start_time = Time.now
+        @current_batch = 0
+      end
+
+      # Returns the completion percentage (0-100).
+      # Returns nil if total is unknown.
+      #
+      # @rbs () -> Float?
+      def percent
+        return nil unless @total&.positive?
+
+        (@completed.to_f / @total * 100).round(2)
+      end
+
+      # Returns the elapsed time in seconds since the operation started.
+      #
+      # @rbs () -> Float
+      def elapsed
+        Time.now - @start_time
+      end
+
+      # Returns the processing rate in items per second.
+      # Returns 0 if no time has elapsed.
+      #
+      # @rbs () -> Float
+      def rate
+        e = elapsed
+        return 0.0 if e.zero?
+
+        @completed / e
+      end
+
+      # Returns the estimated time remaining in seconds.
+      # Returns nil if total is unknown or rate is zero.
+      #
+      # @rbs () -> Float?
+      def eta
+        return nil unless @total
+        return nil if rate.zero?
+        return 0.0 if @completed >= @total
+
+        (@total - @completed) / rate
+      end
+
+      # Returns true if the operation is complete.
+      #
+      # @rbs () -> bool
+      def complete?
+        return false unless @total
+
+        @completed >= @total
+      end
+
+      # Returns a hash representation of the progress state.
+      #
+      # @rbs () -> Hash[Symbol, untyped]
+      def to_h
+        {
+          completed: @completed,
+          total: @total,
+          percent: percent,
+          elapsed: elapsed.round(2),
+          rate: rate.round(2),
+          eta: eta&.round(2)
+        }
+      end
+    end
+  end
+end
diff --git a/lib/classifier/tfidf.rb b/lib/classifier/tfidf.rb
index 02520ee7..56c52b15 100644
--- a/lib/classifier/tfidf.rb
+++ b/lib/classifier/tfidf.rb
@@ -16,6 +16,8 @@ module Classifier
   #   tfidf.transform("Dogs are loyal")  # => {:dog=>0.7071..., :loyal=>0.7071...}
   #
   class TFIDF
+    include Streaming
+
     # @rbs @min_df: Integer | Float
     # @rbs @max_df: Integer | Float
     # @rbs @ngram_range: Array[Integer]
@@ -246,6 +248,70 @@ def marshal_load(data)
       @storage = nil
     end
 
+    # Loads a vectorizer from a checkpoint.
+    #
+    # @rbs (storage: Storage::Base, checkpoint_id: String) -> TFIDF
+    def self.load_checkpoint(storage:, checkpoint_id:)
+      raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)
+
+      dir = File.dirname(storage.path)
+      base = File.basename(storage.path, '.*')
+      ext = File.extname(storage.path)
+      checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")
+
+      checkpoint_storage = Storage::File.new(path: checkpoint_path)
+      instance = load(storage: checkpoint_storage)
+      instance.storage = storage
+      instance
+    end
+
+    # Fits the vectorizer from an IO stream.
+    # Collects all documents from the stream, then fits the model.
+    # Note: All documents must be collected in memory for IDF calculation.
+    #
+    # @example Fit from a file
+    #   tfidf.fit_from_stream(File.open('corpus.txt'))
+    #
+    # @example With progress tracking
+    #   tfidf.fit_from_stream(io, batch_size: 500) do |progress|
+    #     puts "#{progress.completed} documents loaded"
+    #   end
+    #
+    # @rbs (IO, ?batch_size: Integer) { (Streaming::Progress) -> void } -> self
+    def fit_from_stream(io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
+      reader = Streaming::LineReader.new(io, batch_size: batch_size)
+      total = reader.estimate_line_count
+      progress = Streaming::Progress.new(total: total)
+
+      documents = [] #: Array[String]
+
+      reader.each_batch do |batch|
+        documents.concat(batch)
+        progress.completed += batch.size
+        progress.current_batch += 1
+        yield progress if block_given?
+      end
+
+      fit(documents) unless documents.empty?
+      self
+    end
+
+    # TFIDF doesn't support train_from_stream (use fit_from_stream instead).
+    # This method raises NotImplementedError with guidance.
+    #
+    # @rbs (*untyped, **untyped) -> void
+    def train_from_stream(*) # steep:ignore
+      raise NotImplementedError, 'TFIDF uses fit_from_stream instead of train_from_stream'
+    end
+
+    # TFIDF doesn't support train_batch (use fit instead).
+    # This method raises NotImplementedError with guidance.
+    #
+    # @rbs (*untyped, **untyped) -> void
+    def train_batch(*) # steep:ignore
+      raise NotImplementedError, 'TFIDF uses fit instead of train_batch'
+    end
+
     private
 
     # Restores vectorizer state from JSON string.
@@ -329,9 +395,7 @@ def validate_df!(value, name)
     # @rbs (Array[Integer]) -> void
     def validate_ngram_range!(range)
       raise ArgumentError, 'ngram_range must be an array of two integers' unless range.is_a?(Array) && range.size == 2
-      unless range.all?(Integer) && range.all?(&:positive?)
-        raise ArgumentError, 'ngram_range values must be positive integers'
-      end
+      raise ArgumentError, 'ngram_range values must be positive integers' unless range.all?(Integer) && range.all?(&:positive?)
       raise ArgumentError, 'ngram_range[0] must be <= ngram_range[1]' if range[0] > range[1]
     end
 
diff --git a/sig/vendor/matrix.rbs b/sig/vendor/matrix.rbs
index 910018e8..0ec2dab2 100644
--- a/sig/vendor/matrix.rbs
+++ b/sig/vendor/matrix.rbs
@@ -1,26 +1,37 @@
 # Type stubs for matrix gem
-class Vector[T]
+# Using untyped elements since our usage is primarily with Floats/Numerics
+class Vector
   EPSILON: Float
 
-  def self.[]: [T] (*T) -> Vector[T]
+  def self.[]: (*untyped) -> Vector
   def size: () -> Integer
-  def []: (Integer) -> T
+  def []: (Integer) -> untyped
   def magnitude: () -> Float
-  def normalize: () -> Vector[T]
-  def each: () { (T) -> void } -> void
-  def collect: [U] () { (T) -> U } -> Vector[U]
-  def to_a: () -> Array[T]
+  def normalize: () -> Vector
+  def each: () { (untyped) -> void } -> void
+  def collect: () { (untyped) -> untyped } -> Vector
+  def to_a: () -> Array[untyped]
   def *: (untyped) -> untyped
+  def -: (Vector) -> Vector
+  def is_a?: (untyped) -> bool
 end
 
-class Matrix[T]
-  def self.rows: [T] (Array[Array[T]]) -> Matrix[T]
-  def self.[]: [T] (*Array[T]) -> Matrix[T]
-  def self.diag: (untyped) -> Matrix[untyped]
-  def trans: () -> Matrix[T]
+class Matrix
+  def self.rows: (Array[Array[untyped]]) -> Matrix
+  def self.[]: (*Array[untyped]) -> Matrix
+  def self.diag: (untyped) -> Matrix
+  def self.columns: (Array[Array[untyped]]) -> Matrix
+  def self.empty: (Integer, Integer) -> Matrix
+  def self.zero: (Integer, Integer) -> Matrix
+  def self.vstack: (Matrix, Matrix) -> Matrix
+  def trans: () -> Matrix
+  def transpose: () -> Matrix
   def *: (untyped) -> untyped
   def row_size: () -> Integer
   def column_size: () -> Integer
-  def column: (Integer) -> Vector[T]
-  def SV_decomp: () -> [Matrix[T], Matrix[T], untyped]
+  def row: (Integer) -> Vector
+  def column: (Integer) -> Vector
+  def SV_decomp: () -> [Matrix, Matrix, untyped]
+  def is_a?: (untyped) -> bool
+  def respond_to?: (Symbol) -> bool
 end
diff --git a/sig/vendor/streaming.rbs b/sig/vendor/streaming.rbs
new file mode 100644
index 00000000..ee42c16e
--- /dev/null
+++ b/sig/vendor/streaming.rbs
@@ -0,0 +1,14 @@
+# Type stubs for Streaming module
+# Defines the interface that including classes must implement
+
+module Classifier
+  # Interface for classes that include Streaming
+  interface _StreamingHost
+    def storage: () -> Storage::Base?
+    def storage=: (Storage::Base?) -> void
+    def save: () -> void
+  end
+
+  module Streaming : _StreamingHost
+  end
+end
diff --git a/test/bayes/streaming_test.rb b/test/bayes/streaming_test.rb
new file mode 100644
index 00000000..fd2f311a
--- /dev/null
+++ b/test/bayes/streaming_test.rb
@@ -0,0 +1,337 @@
+require_relative '../test_helper'
+require 'stringio'
+require 'tempfile'
+
+class BayesStreamingTest < Minitest::Test
+  def setup
+    @classifier = Classifier::Bayes.new('Spam', 'Ham')
+  end
+
+  # train_from_stream tests
+
+  def test_train_from_stream_basic
+    io = StringIO.new("buy now cheap\nfree money\nlimited offer\n")
+    @classifier.train_from_stream(:spam, io)
+
+    # Should have trained 3 documents
+    assert_equal 'Spam', @classifier.classify('buy cheap free')
+  end
+
+  def test_train_from_stream_empty_io
+    io = StringIO.new('')
+    @classifier.train_from_stream(:spam, io)
+
+    # No documents trained, classifier should still work
+    assert_includes @classifier.categories, 'Spam'
+  end
+
+  def test_train_from_stream_single_line
+    io = StringIO.new("this is spam content\n")
+    @classifier.train_from_stream(:spam, io)
+
+    # Train some ham to make classification meaningful
+    @classifier.train(:ham, 'this is normal email')
+
+    result = @classifier.classify('spam content')
+
+    assert_equal 'Spam', result
+  end
+
+  def test_train_from_stream_with_batch_size
+    lines = (1..50).map { |i| "document number #{i} with some content" }
+    io = StringIO.new(lines.join("\n"))
+
+    batches_processed = 0
+    @classifier.train_from_stream(:spam, io, batch_size: 10) do |progress|
+      batches_processed = progress.current_batch
+    end
+
+    assert_equal 5, batches_processed
+  end
+
+  def test_train_from_stream_progress_tracking
+    lines = (1..25).map { |i| "line #{i}" }
+    io = StringIO.new(lines.join("\n"))
+
+    completed_values = []
+    @classifier.train_from_stream(:spam, io, batch_size: 10) do |progress|
+      completed_values << progress.completed
+    end
+
+    assert_equal [10, 20, 25], completed_values
+  end
+
+  def test_train_from_stream_with_progress_block
+    io = StringIO.new("doc1\ndoc2\ndoc3\n")
+    progress_received = nil
+
+    @classifier.train_from_stream(:spam, io, batch_size: 2) do |progress|
+      progress_received = progress
+    end
+
+    assert_instance_of Classifier::Streaming::Progress, progress_received
+    assert_equal 3, progress_received.completed
+  end
+
+  def test_train_from_stream_invalid_category
+    io = StringIO.new("some content\n")
+
+    assert_raises(StandardError) do
+      @classifier.train_from_stream(:nonexistent, io)
+    end
+  end
+
+  def test_train_from_stream_marks_dirty
+    io = StringIO.new("content\n")
+
+    refute_predicate @classifier, :dirty?
+
+    @classifier.train_from_stream(:spam, io)
+
+    assert_predicate @classifier, :dirty?
+  end
+
+  def test_train_from_stream_with_file
+    Tempfile.create(['corpus', '.txt']) do |file|
+      file.puts 'spam message one'
+      file.puts 'spam message two'
+      file.puts 'spam message three'
+      file.flush
+      file.rewind
+
+      @classifier.train_from_stream(:spam, file)
+    end
+
+    @classifier.train(:ham, 'normal message here')
+
+    assert_equal 'Spam', @classifier.classify('spam message')
+  end
+
+  # train_batch tests
+
+  def test_train_batch_positional_style
+    documents = ['buy now', 'free money', 'limited offer']
+    @classifier.train_batch(:spam, documents)
+
+    @classifier.train(:ham, 'hello friend')
+
+    assert_equal 'Spam', @classifier.classify('buy free limited')
+  end
+
+  def test_train_batch_keyword_style
+    spam_docs = ['buy now', 'free money']
+    ham_docs = ['hello friend', 'meeting tomorrow']
+
+    @classifier.train_batch(spam: spam_docs, ham: ham_docs)
+
+    assert_equal 'Spam', @classifier.classify('buy free')
+    assert_equal 'Ham', @classifier.classify('hello meeting')
+  end
+
+  def test_train_batch_with_batch_size
+    documents = (1..100).map { |i| "document #{i}" }
+
+    batches = 0
+    @classifier.train_batch(:spam, documents, batch_size: 25) do |_progress|
+      batches += 1
+    end
+
+    assert_equal 4, batches
+  end
+
+  def test_train_batch_progress_tracking
+    documents = (1..30).map { |i| "doc #{i}" }
+
+    completed_values = []
+    @classifier.train_batch(:spam, documents, batch_size: 10) do |progress|
+      completed_values << progress.completed
+    end
+
+    assert_equal [10, 20, 30], completed_values
+  end
+
+  def test_train_batch_progress_percent
+    documents = (1..100).map { |i| "doc #{i}" }
+
+    percent_values = []
+    @classifier.train_batch(:spam, documents, batch_size: 25) do |progress|
+      percent_values << progress.percent
+    end
+
+    assert_equal [25.0, 50.0, 75.0, 100.0], percent_values
+  end
+
+  def test_train_batch_empty_array
+    @classifier.train_batch(:spam, [])
+    # Should not raise, classifier should be unchanged
+    assert_includes @classifier.categories, 'Spam'
+  end
+
+  def test_train_batch_invalid_category
+    assert_raises(StandardError) do
+      @classifier.train_batch(:nonexistent, ['doc'])
+    end
+  end
+
+  def test_train_batch_marks_dirty
+    refute_predicate @classifier, :dirty?
+    @classifier.train_batch(:spam, ['content'])
+
+    assert_predicate @classifier, :dirty?
+  end
+
+  def test_train_batch_multiple_categories
+    @classifier.train_batch(
+      spam: ['buy now', 'free offer'],
+      ham: %w[hello meeting]
+    )
+
+    assert_equal 'Spam', @classifier.classify('buy free')
+    assert_equal 'Ham', @classifier.classify('hello meeting')
+  end
+
+  # Equivalence tests
+
+  def test_train_batch_equivalent_to_train
+    classifier1 = Classifier::Bayes.new('Spam', 'Ham')
+    classifier2 = Classifier::Bayes.new('Spam', 'Ham')
+
+    documents = ['buy now cheap', 'free money fast', 'limited time offer']
+
+    # Train with regular train
+    documents.each { |doc| classifier1.train(:spam, doc) }
+
+    # Train with train_batch
+    classifier2.train_batch(:spam, documents)
+
+    # Both should classify the same
+    test_doc = 'buy cheap free limited'
+
+    assert_equal classifier1.classify(test_doc), classifier2.classify(test_doc)
+
+    # Classifications should be identical
+    assert_equal classifier1.classifications(test_doc), classifier2.classifications(test_doc)
+  end
+
+  def test_train_from_stream_equivalent_to_train
+    classifier1 = Classifier::Bayes.new('Spam', 'Ham')
+    classifier2 = Classifier::Bayes.new('Spam', 'Ham')
+
+    documents = ['buy now cheap', 'free money fast', 'limited time offer']
+
+    # Train with regular train
+    documents.each { |doc| classifier1.train(:spam, doc) }
+
+    # Train with train_from_stream
+    io = StringIO.new(documents.join("\n"))
+    classifier2.train_from_stream(:spam, io)
+
+    # Both should classify the same
+    test_doc = 'buy cheap free limited'
+
+    assert_equal classifier1.classify(test_doc), classifier2.classify(test_doc)
+  end
+
+  # Checkpoint tests
+
+  def test_save_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'classifier.json')
+      @classifier.storage = Classifier::Storage::File.new(path: path)
+
+      @classifier.train(:spam, 'buy now cheap')
+      @classifier.save_checkpoint('50pct')
+
+      checkpoint_path = File.join(dir, 'classifier_checkpoint_50pct.json')
+
+      assert_path_exists checkpoint_path
+    end
+  end
+
+  def test_load_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'classifier.json')
+      storage = Classifier::Storage::File.new(path: path)
+      @classifier.storage = storage
+
+      @classifier.train(:spam, 'buy now cheap')
+      @classifier.save_checkpoint('halfway')
+
+      # Load from checkpoint
+      loaded = Classifier::Bayes.load_checkpoint(storage: storage, checkpoint_id: 'halfway')
+
+      # Should have the same training
+      assert_equal @classifier.classify('buy cheap'), loaded.classify('buy cheap')
+    end
+  end
+
+  def test_list_checkpoints
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'classifier.json')
+      @classifier.storage = Classifier::Storage::File.new(path: path)
+
+      @classifier.train(:spam, 'content')
+      @classifier.save_checkpoint('10pct')
+      @classifier.save_checkpoint('50pct')
+      @classifier.save_checkpoint('90pct')
+
+      checkpoints = @classifier.list_checkpoints
+
+      assert_equal %w[10pct 50pct 90pct], checkpoints.sort
+    end
+  end
+
+  def test_delete_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'classifier.json')
+      @classifier.storage = Classifier::Storage::File.new(path: path)
+
+      @classifier.train(:spam, 'content')
+      @classifier.save_checkpoint('test')
+
+      checkpoint_path = File.join(dir, 'classifier_checkpoint_test.json')
+
+      assert_path_exists checkpoint_path
+
+      @classifier.delete_checkpoint('test')
+
+      refute_path_exists checkpoint_path
+    end
+  end
+
+  def test_save_checkpoint_requires_storage
+    assert_raises(ArgumentError) do
+      @classifier.save_checkpoint('test')
+    end
+  end
+
+  def test_checkpoint_workflow
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'classifier.json')
+      storage = Classifier::Storage::File.new(path: path)
+
+      # First training session
+      classifier1 = Classifier::Bayes.new('Spam', 'Ham')
+      classifier1.storage = storage
+
+      classifier1.train(:spam, 'buy now')
+      classifier1.train(:spam, 'free money')
+      classifier1.save_checkpoint('phase1')
+
+      # Simulate restart - load from checkpoint
+      classifier2 = Classifier::Bayes.load_checkpoint(storage: storage, checkpoint_id: 'phase1')
+
+      # Continue training
+      classifier2.train(:ham, 'hello friend')
+      classifier2.train(:ham, 'meeting tomorrow')
+      classifier2.save_checkpoint('phase2')
+
+      # Load final checkpoint
+      final = Classifier::Bayes.load_checkpoint(storage: storage, checkpoint_id: 'phase2')
+
+      # Should have all training
+      assert_equal 'Spam', final.classify('buy free')
+      assert_equal 'Ham', final.classify('hello meeting')
+    end
+  end
+end
diff --git a/test/knn/knn_test.rb b/test/knn/knn_test.rb
index 9eed7dbb..f9e213bd 100644
--- a/test/knn/knn_test.rb
+++ b/test/knn/knn_test.rb
@@ -529,4 +529,56 @@ def test_items_returns_copy
 
     assert_equal 1, knn.items.size
   end
+
+  # API consistency tests (with Bayes and LogisticRegression)
+
+  def test_train_alias_for_add
+    knn = Classifier::KNN.new(k: 3)
+    knn.train(Dog: [@str1, @str2], Cat: [@str3, @str4])
+
+    assert_equal 4, knn.items.size
+    assert_equal 'Dog', knn.classify('This is about dogs')
+    assert_equal 'Cat', knn.classify('This is about cats')
+  end
+
+  def test_dynamic_train_methods
+    knn = Classifier::KNN.new(k: 3)
+    knn.train_dog @str1, @str2
+    knn.train_cat @str3, @str4
+
+    assert_equal 4, knn.items.size
+    # Dynamic methods create lowercase category names from method name
+    assert_equal 'dog', knn.classify('This is about dogs')
+    assert_equal 'cat', knn.classify('This is about cats')
+  end
+
+  def test_respond_to_train_methods
+    knn = Classifier::KNN.new
+
+    assert_respond_to knn, :train
+    assert_respond_to knn, :train_spam
+    assert_respond_to knn, :train_any_category
+  end
+
+  def test_classify_returns_string_not_symbol
+    knn = Classifier::KNN.new(k: 3)
+    knn.add(dog: [@str1, @str2], cat: [@str3, @str4]) # symbol keys
+
+    result = knn.classify('This is about dogs')
+
+    assert_instance_of String, result
+    assert_equal 'dog', result
+  end
+
+  def test_categories_returns_array_of_strings
+    knn = Classifier::KNN.new
+    knn.add(dog: 'Dogs are great', cat: 'Cats are independent')
+
+    cats = knn.categories
+
+    assert_instance_of Array, cats
+    cats.each { |cat| assert_instance_of String, cat }
+    assert_includes cats, 'dog'
+    assert_includes cats, 'cat'
+  end
 end
diff --git a/test/lsi/incremental_svd_test.rb b/test/lsi/incremental_svd_test.rb
new file mode 100644
index 00000000..9ddfbcc8
--- /dev/null
+++ b/test/lsi/incremental_svd_test.rb
@@ -0,0 +1,304 @@
+require_relative '../test_helper'
+
+class IncrementalSVDModuleTest < Minitest::Test
+  def test_update_with_new_direction
+    # Create a simple U matrix (3 terms × 2 components)
+    u = Matrix[
+      [0.5, 0.5],
+      [0.5, -0.5],
+      [0.707, 0.0]
+    ]
+    s = [2.0, 1.0]
+
+    # New document vector (3 terms)
+    c = Vector[0.1, 0.2, 0.9]
+
+    u_new, s_new = Classifier::LSI::IncrementalSVD.update(u, s, c, max_rank: 10)
+
+    # Should have updated matrices
+    assert_instance_of Matrix, u_new
+    assert_instance_of Array, s_new
+
+    # Rank may have increased by 1 (if new direction found)
+    assert_operator s_new.size, :>=, s.size
+    assert_operator s_new.size, :<=, s.size + 1
+  end
+
+  def test_update_preserves_orthogonality
+    # Start with orthonormal U
+    u = Matrix[
+      [1.0, 0.0],
+      [0.0, 1.0],
+      [0.0, 0.0]
+    ]
+    s = [2.0, 1.5]
+
+    c = Vector[0.5, 0.5, 0.707]
+
+    u_new, _s_new = Classifier::LSI::IncrementalSVD.update(u, s, c, max_rank: 10)
+
+    # Check columns are approximately orthonormal
+    u_new.column_size.times do |i|
+      col_i = u_new.column(i)
+      # Column should have unit length (approximately)
+      norm = Math.sqrt(col_i.to_a.sum { |x| x * x })
+
+      assert_in_delta 1.0, norm, 0.1, "Column #{i} should have unit length"
+    end
+  end
+
+  def test_update_with_vector_in_span
+    # U spans the first two dimensions
+    u = Matrix[
+      [1.0, 0.0],
+      [0.0, 1.0],
+      [0.0, 0.0]
+    ]
+    s = [2.0, 1.0]
+
+    # Vector entirely in span of U (no component in third dimension)
+    c = Vector[0.6, 0.8, 0.0]
+
+    _u_new, s_new = Classifier::LSI::IncrementalSVD.update(u, s, c, max_rank: 10)
+
+    # Rank should not increase when vector is in span
+    assert_equal s.size, s_new.size
+  end
+
+  def test_update_respects_max_rank
+    u = Matrix[
+      [1.0, 0.0],
+      [0.0, 1.0],
+      [0.0, 0.0]
+    ]
+    s = [2.0, 1.0]
+
+    c = Vector[0.1, 0.1, 0.99]
+
+    # With max_rank = 2, should not exceed 2 components
+    u_new, s_new = Classifier::LSI::IncrementalSVD.update(u, s, c, max_rank: 2)
+
+    assert_equal 2, s_new.size
+    assert_equal 2, u_new.column_size
+  end
+
+  def test_singular_values_sorted_descending
+    u = Matrix[
+      [0.707, 0.707],
+      [0.707, -0.707],
+      [0.0, 0.0]
+    ]
+    s = [3.0, 1.0]
+
+    c = Vector[0.5, 0.5, 0.707]
+
+    _u_new, s_new = Classifier::LSI::IncrementalSVD.update(u, s, c, max_rank: 10)
+
+    # Singular values should be sorted in descending order
+    (0...(s_new.size - 1)).each do |i|
+      assert_operator s_new[i], :>=, s_new[i + 1], 'Singular values should be descending'
+    end
+  end
+end
+
+class LSIIncrementalModeTest < Minitest::Test
+  def setup
+    @dog_docs = [
+      'dogs are loyal pets that bark',
+      'puppies are playful young dogs',
+      'dogs love to play fetch'
+    ]
+    @cat_docs = [
+      'cats are independent pets',
+      'kittens are curious creatures',
+      'cats meow and purr softly'
+    ]
+  end
+
+  def test_incremental_mode_initialization
+    lsi = Classifier::LSI.new(incremental: true)
+
+    assert lsi.instance_variable_get(:@incremental_mode)
+    refute_predicate lsi, :incremental_enabled? # Not active until first build
+  end
+
+  def test_incremental_mode_with_max_rank
+    lsi = Classifier::LSI.new(incremental: true, max_rank: 50)
+
+    assert_equal 50, lsi.instance_variable_get(:@max_rank)
+  end
+
+  def test_incremental_enabled_after_build
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi.add_item(doc, :cat) }
+
+    refute_predicate lsi, :incremental_enabled?
+
+    lsi.build_index
+
+    assert_predicate lsi, :incremental_enabled?
+    assert_instance_of Matrix, lsi.instance_variable_get(:@u_matrix)
+  end
+
+  def test_incremental_add_uses_incremental_update
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    # Add initial documents
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi.add_item(doc, :cat) }
+    lsi.build_index
+
+    initial_version = lsi.instance_variable_get(:@version)
+
+    # Add new document - should use incremental update
+    lsi.add_item('my dog loves to run and play', :dog)
+
+    # Version should have incremented
+    assert_equal initial_version + 1, lsi.instance_variable_get(:@version)
+
+    # Should still be in incremental mode
+    assert_predicate lsi, :incremental_enabled?
+  end
+
+  def test_incremental_classification_works
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi.add_item(doc, :cat) }
+    lsi.build_index
+
+    # Add more documents incrementally
+    lsi.add_item('dogs are wonderful companions', :dog)
+    lsi.add_item('cats sleep a lot during the day', :cat)
+
+    # Classification should work
+    result = lsi.classify('loyal pet that barks').to_s
+
+    assert_equal 'dog', result
+
+    result = lsi.classify('independent creature that meows').to_s
+
+    assert_equal 'cat', result
+  end
+
+  def test_current_rank
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi.add_item(doc, :cat) }
+    lsi.build_index
+
+    rank = lsi.current_rank
+
+    assert_instance_of Integer, rank
+    assert_predicate rank, :positive?
+  end
+
+  def test_disable_incremental_mode
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    lsi.build_index
+
+    assert_predicate lsi, :incremental_enabled?
+
+    lsi.disable_incremental_mode!
+
+    refute_predicate lsi, :incremental_enabled?
+    assert_nil lsi.instance_variable_get(:@u_matrix)
+  end
+
+  def test_enable_incremental_mode
+    lsi = Classifier::LSI.new(auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    lsi.build_index
+
+    refute_predicate lsi, :incremental_enabled?
+
+    lsi.enable_incremental_mode!(max_rank: 75)
+    lsi.build_index(force: true)
+
+    assert_predicate lsi, :incremental_enabled?
+    assert_equal 75, lsi.instance_variable_get(:@max_rank)
+  end
+
+  def test_force_rebuild
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    lsi.build_index
+
+    # Force rebuild should work
+    lsi.build_index(force: true)
+
+    assert_predicate lsi, :incremental_enabled?
+  end
+
+  def test_vocabulary_growth_triggers_full_rebuild
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    # Start with a small vocabulary
+    lsi.add_item('dog', :animal)
+    lsi.add_item('cat', :animal)
+    lsi.build_index
+
+    # Store initial vocab size to verify growth detection works
+    _initial_vocab_size = lsi.instance_variable_get(:@initial_vocab_size)
+
+    # Add document with many new words (> 20% of initial vocabulary)
+    # This should trigger a full rebuild and disable incremental mode
+    many_new_words = (1..100).map { |i| "newword#{i}" }.join(' ')
+    lsi.add_item(many_new_words, :animal)
+
+    # After vocabulary growth > 20%, incremental mode should be disabled
+    # and a full rebuild should have occurred
+    refute_predicate lsi, :incremental_enabled?
+  end
+
+  def test_incremental_produces_reasonable_results
+    # Build with full SVD
+    lsi_full = Classifier::LSI.new(auto_rebuild: false)
+    @dog_docs.each { |doc| lsi_full.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi_full.add_item(doc, :cat) }
+    lsi_full.add_item('my dog is a great friend', :dog)
+    lsi_full.build_index
+
+    # Build with incremental mode
+    lsi_inc = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+    @dog_docs.each { |doc| lsi_inc.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi_inc.add_item(doc, :cat) }
+    lsi_inc.build_index
+    lsi_inc.add_item('my dog is a great friend', :dog)
+
+    # Both should classify test documents reasonably
+    # Note: Results may differ due to approximation, but should be reasonable
+    test_doc = 'loyal pet barking'
+
+    _full_result = lsi_full.classify(test_doc).to_s
+    inc_result = lsi_inc.classify(test_doc).to_s
+
+    # At minimum, incremental should produce a valid classification
+    assert_includes %w[dog cat], inc_result
+  end
+
+  def test_incremental_with_streaming_api
+    lsi = Classifier::LSI.new(incremental: true, auto_rebuild: false)
+
+    # Add initial batch
+    @dog_docs.each { |doc| lsi.add_item(doc, :dog) }
+    @cat_docs.each { |doc| lsi.add_item(doc, :cat) }
+    lsi.build_index
+
+    # Use streaming API to add more
+    io = StringIO.new("dogs bark loudly\ndogs wag their tails\n")
+    lsi.train_from_stream(:dog, io)
+
+    # Should still work
+    result = lsi.classify('barking dog wags tail').to_s
+
+    assert_equal 'dog', result
+  end
+end
diff --git a/test/lsi/streaming_test.rb b/test/lsi/streaming_test.rb
new file mode 100644
index 00000000..94bfaafb
--- /dev/null
+++ b/test/lsi/streaming_test.rb
@@ -0,0 +1,343 @@
+require_relative '../test_helper'
+require 'stringio'
+require 'tempfile'
+
+class LSIStreamingTest < Minitest::Test
+  def setup
+    @lsi = Classifier::LSI.new
+  end
+
+  # train_from_stream tests
+
+  def test_train_from_stream_basic
+    dog_docs = "dogs are loyal pets\npuppies are playful\ndogs bark at strangers\n"
+    cat_docs = "cats are independent\nkittens are curious\ncats meow softly\n"
+
+    @lsi.train_from_stream(:dog, StringIO.new(dog_docs))
+    @lsi.train_from_stream(:cat, StringIO.new(cat_docs))
+
+    # Should be able to classify
+    # Note: classify returns the category as originally stored (symbol in this case)
+    result = @lsi.classify('loyal pet that barks')
+
+    assert_equal 'dog', result.to_s
+  end
+
+  def test_train_from_stream_empty_io
+    @lsi.train_from_stream(:category, StringIO.new(''))
+
+    # No items added
+    assert_empty @lsi.items
+  end
+
+  def test_train_from_stream_single_line
+    @lsi.train_from_stream(:dog, StringIO.new("dogs are loyal pets\n"))
+    @lsi.train_from_stream(:cat, StringIO.new("cats are independent\n"))
+
+    # Should have 2 items
+    assert_equal 2, @lsi.items.size
+  end
+
+  def test_train_from_stream_with_batch_size
+    lines = (1..50).map { |i| "document number #{i} about dogs" }
+    io = StringIO.new(lines.join("\n"))
+
+    batches_processed = 0
+    @lsi.train_from_stream(:dog, io, batch_size: 10) do |progress|
+      batches_processed = progress.current_batch
+    end
+
+    assert_equal 5, batches_processed
+  end
+
+  def test_train_from_stream_progress_tracking
+    lines = (1..25).map { |i| "document #{i}" }
+    io = StringIO.new(lines.join("\n"))
+
+    completed_values = []
+    @lsi.train_from_stream(:category, io, batch_size: 10) do |progress|
+      completed_values << progress.completed
+    end
+
+    assert_equal [10, 20, 25], completed_values
+  end
+
+  def test_train_from_stream_marks_dirty
+    refute_predicate @lsi, :dirty?
+    @lsi.train_from_stream(:category, StringIO.new("content\n"))
+
+    assert_predicate @lsi, :dirty?
+  end
+
+  def test_train_from_stream_rebuilds_index_when_auto_rebuild
+    @lsi = Classifier::LSI.new(auto_rebuild: true)
+
+    dog_docs = "dogs are loyal\ndogs bark\n"
+    cat_docs = "cats are independent\ncats meow\n"
+
+    @lsi.train_from_stream(:dog, StringIO.new(dog_docs))
+    @lsi.train_from_stream(:cat, StringIO.new(cat_docs))
+
+    # Index should be built
+    refute_predicate @lsi, :needs_rebuild?
+  end
+
+  def test_train_from_stream_skips_rebuild_when_auto_rebuild_false
+    @lsi = Classifier::LSI.new(auto_rebuild: false)
+
+    @lsi.train_from_stream(:category, StringIO.new("document one\ndocument two\n"))
+
+    # Index should need rebuild
+    assert_predicate @lsi, :needs_rebuild?
+  end
+
+  def test_train_from_stream_with_file
+    Tempfile.create(['corpus', '.txt']) do |file|
+      file.puts 'dogs are loyal pets'
+      file.puts 'puppies are playful animals'
+      file.puts 'dogs bark and play'
+      file.flush
+      file.rewind
+
+      @lsi.train_from_stream(:dog, file)
+    end
+
+    @lsi.add_item('cats are independent', :cat)
+    @lsi.add_item('kittens are curious', :cat)
+
+    result = @lsi.classify('loyal pet')
+
+    assert_equal 'dog', result.to_s
+  end
+
+  # add_batch tests
+
+  def test_add_batch_basic
+    dog_docs = ['dogs are loyal', 'puppies play', 'dogs bark']
+    cat_docs = ['cats are independent', 'kittens curious', 'cats meow']
+
+    @lsi.add_batch(dog: dog_docs, cat: cat_docs)
+
+    assert_equal 6, @lsi.items.size
+    assert_equal 'dog', @lsi.classify('loyal dog barks').to_s
+  end
+
+  def test_add_batch_with_progress
+    docs = (1..30).map { |i| "document #{i}" }
+
+    completed_values = []
+    @lsi.add_batch(batch_size: 10, category: docs) do |progress|
+      completed_values << progress.completed
+    end
+
+    assert_equal [10, 20, 30], completed_values
+  end
+
+  def test_add_batch_empty
+    @lsi.add_batch(category: [])
+
+    assert_empty @lsi.items
+  end
+
+  def test_add_batch_marks_dirty
+    refute_predicate @lsi, :dirty?
+    @lsi.add_batch(category: ['doc'])
+
+    assert_predicate @lsi, :dirty?
+  end
+
+  def test_add_batch_rebuilds_when_auto_rebuild
+    @lsi = Classifier::LSI.new(auto_rebuild: true)
+
+    @lsi.add_batch(
+      dog: ['dogs bark', 'puppies play'],
+      cat: ['cats meow', 'kittens purr']
+    )
+
+    refute_predicate @lsi, :needs_rebuild?
+  end
+
+  # train_batch tests (alias for add_batch)
+
+  def test_train_batch_positional_style
+    docs = ['dogs are loyal', 'puppies play']
+    @lsi.train_batch(:dog, docs)
+
+    @lsi.add_item('cats are independent', :cat)
+
+    assert_equal 3, @lsi.items.size
+  end
+
+  def test_train_batch_keyword_style
+    @lsi.train_batch(
+      dog: ['dogs are loyal', 'puppies play'],
+      cat: ['cats are independent', 'kittens curious']
+    )
+
+    assert_equal 4, @lsi.items.size
+    assert_equal 'dog', @lsi.classify('loyal dog').to_s
+  end
+
+  def test_train_batch_with_progress
+    docs = (1..20).map { |i| "doc #{i}" }
+
+    batches = 0
+    @lsi.train_batch(:category, docs, batch_size: 5) do |_progress|
+      batches += 1
+    end
+
+    assert_equal 4, batches
+  end
+
+  # Equivalence tests
+
+  def test_train_from_stream_equivalent_to_add_item
+    lsi1 = Classifier::LSI.new
+    lsi2 = Classifier::LSI.new
+
+    documents = ['dogs are loyal pets', 'puppies are playful', 'dogs bark at strangers']
+
+    # Add with add_item
+    documents.each { |doc| lsi1.add_item(doc, :dog) }
+
+    # Add with train_from_stream
+    io = StringIO.new(documents.join("\n"))
+    lsi2.train_from_stream(:dog, io)
+
+    # Both should have same items
+    assert_equal lsi1.items.size, lsi2.items.size
+
+    # Add some cat documents to both for classification
+    lsi1.add_item('cats are independent', :cat)
+    lsi2.add_item('cats are independent', :cat)
+
+    # Both should classify the same
+    test_doc = 'loyal playful pet'
+
+    assert_equal lsi1.classify(test_doc).to_s, lsi2.classify(test_doc).to_s
+  end
+
+  def test_add_batch_equivalent_to_add
+    lsi1 = Classifier::LSI.new
+    lsi2 = Classifier::LSI.new
+
+    dog_docs = ['dogs bark', 'puppies play']
+    cat_docs = ['cats meow', 'kittens purr']
+
+    # Add with hash-style add
+    lsi1.add(dog: dog_docs, cat: cat_docs)
+
+    # Add with add_batch
+    lsi2.add_batch(dog: dog_docs, cat: cat_docs)
+
+    # Both should have same items
+    assert_equal lsi1.items.size, lsi2.items.size
+
+    # Both should classify the same
+    test_doc = 'barking dog'
+
+    assert_equal lsi1.classify(test_doc).to_s, lsi2.classify(test_doc).to_s
+  end
+
+  # Checkpoint tests
+
+  def test_save_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'lsi.json')
+      @lsi.storage = Classifier::Storage::File.new(path: path)
+
+      @lsi.add_item('dogs are loyal', :dog)
+      @lsi.add_item('cats are independent', :cat)
+      @lsi.save_checkpoint('50pct')
+
+      checkpoint_path = File.join(dir, 'lsi_checkpoint_50pct.json')
+
+      assert_path_exists checkpoint_path
+    end
+  end
+
+  def test_load_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'lsi.json')
+      storage = Classifier::Storage::File.new(path: path)
+      @lsi.storage = storage
+
+      @lsi.add_item('dogs are loyal', :dog)
+      @lsi.add_item('cats are independent', :cat)
+      @lsi.save_checkpoint('halfway')
+
+      # Load from checkpoint
+      loaded = Classifier::LSI.load_checkpoint(storage: storage, checkpoint_id: 'halfway')
+
+      # Should have the same items and same classification
+      assert_equal @lsi.items.size, loaded.items.size
+      # Compare as strings since loaded classifier may have string categories
+      assert_equal @lsi.classify('loyal dog').to_s, loaded.classify('loyal dog').to_s
+    end
+  end
+
+  def test_list_checkpoints
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'lsi.json')
+      @lsi.storage = Classifier::Storage::File.new(path: path)
+
+      @lsi.add_item('content', :category)
+      @lsi.save_checkpoint('10pct')
+      @lsi.save_checkpoint('50pct')
+      @lsi.save_checkpoint('90pct')
+
+      checkpoints = @lsi.list_checkpoints
+
+      assert_equal %w[10pct 50pct 90pct], checkpoints.sort
+    end
+  end
+
+  def test_delete_checkpoint
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'lsi.json')
+      @lsi.storage = Classifier::Storage::File.new(path: path)
+
+      @lsi.add_item('content', :category)
+      @lsi.save_checkpoint('test')
+
+      checkpoint_path = File.join(dir, 'lsi_checkpoint_test.json')
+
+      assert_path_exists checkpoint_path
+
+      @lsi.delete_checkpoint('test')
+
+      refute_path_exists checkpoint_path
+    end
+  end
+
+  def test_checkpoint_workflow
+    Dir.mktmpdir do |dir|
+      path = File.join(dir, 'lsi.json')
+      storage = Classifier::Storage::File.new(path: path)
+
+      # First training session
+      lsi1 = Classifier::LSI.new
+      lsi1.storage = storage
+
+      lsi1.add_item('dogs are loyal', :dog)
+      lsi1.add_item('puppies are playful', :dog)
+      lsi1.save_checkpoint('phase1')
+
+      # Simulate restart - load from checkpoint
+      lsi2 = Classifier::LSI.load_checkpoint(storage: storage, checkpoint_id: 'phase1')
+
+      # Continue training
+      lsi2.add_item('cats are independent', :cat)
+      lsi2.add_item('kittens are curious', :cat)
+      lsi2.save_checkpoint('phase2')
+
+      # Load final checkpoint
+      final = Classifier::LSI.load_checkpoint(storage: storage, checkpoint_id: 'phase2')
+
+      # Should have all training
+      assert_equal 4, final.items.size
+      assert_equal 'dog', final.classify('loyal playful').to_s
+      assert_equal 'cat', final.classify('independent curious').to_s
+    end
+  end
+end
diff --git a/test/storage/storage_test.rb b/test/storage/storage_test.rb
index ddd90b39..b72c246e 100644
--- a/test/storage/storage_test.rb
+++ b/test/storage/storage_test.rb
@@ -1,17 +1,21 @@
 require_relative '../test_helper'
 
 class StorageAPIConsistencyTest < Minitest::Test
-  CLASSIFIERS = [
-    Classifier::Bayes,
-    Classifier::LSI,
-    Classifier::KNN,
-    Classifier::LogisticRegression,
-    Classifier::TFIDF
-  ].freeze
+  # Dynamically discover all classifier/vectorizer classes
+  CLASSIFIERS = Classifier.constants.filter_map do |const|
+    klass = Classifier.const_get(const)
+    next unless klass.is_a?(Class)
+
+    klass if klass.method_defined?(:classify) || klass.method_defined?(:transform)
+  end.freeze
 
   INSTANCE_METHODS = %i[save reload reload! dirty? storage storage=].freeze
   CLASS_METHODS = %i[load].freeze
 
+  def test_classifiers_discovered
+    assert_operator CLASSIFIERS.size, :>=, 5, "Expected at least 5 classifiers, found: #{CLASSIFIERS.map(&:name)}"
+  end
+
   CLASSIFIERS.each do |klass|
     class_name = klass.name.split('::').last.downcase
 
diff --git a/test/streaming/streaming_test.rb b/test/streaming/streaming_test.rb
new file mode 100644
index 00000000..5e4c5c9b
--- /dev/null
+++ b/test/streaming/streaming_test.rb
@@ -0,0 +1,374 @@
+require_relative '../test_helper'
+require 'stringio'
+
+class ProgressTest < Minitest::Test
+  def test_initialization_with_defaults
+    progress = Classifier::Streaming::Progress.new
+
+    assert_equal 0, progress.completed
+    assert_nil progress.total
+    assert_instance_of Time, progress.start_time
+  end
+
+  def test_initialization_with_total
+    progress = Classifier::Streaming::Progress.new(total: 100)
+
+    assert_equal 0, progress.completed
+    assert_equal 100, progress.total
+  end
+
+  def test_initialization_with_completed
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 50)
+
+    assert_equal 50, progress.completed
+    assert_equal 100, progress.total
+  end
+
+  def test_percent_with_known_total
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 25)
+
+    assert_in_delta(25.0, progress.percent)
+  end
+
+  def test_percent_with_fractional_value
+    progress = Classifier::Streaming::Progress.new(total: 3, completed: 1)
+
+    assert_in_delta(33.33, progress.percent)
+  end
+
+  def test_percent_with_unknown_total
+    progress = Classifier::Streaming::Progress.new
+    progress.completed = 50
+
+    assert_nil progress.percent
+  end
+
+  def test_percent_with_zero_total
+    progress = Classifier::Streaming::Progress.new(total: 0)
+
+    assert_nil progress.percent
+  end
+
+  def test_elapsed_time
+    progress = Classifier::Streaming::Progress.new
+    sleep 0.01
+
+    assert_operator progress.elapsed, :>=, 0.01
+    assert_operator progress.elapsed, :<, 1
+  end
+
+  def test_rate_calculation
+    progress = Classifier::Streaming::Progress.new(completed: 0)
+    sleep 0.01 # Ensure some time passes
+    progress.completed = 100
+    # Rate should be roughly 100 / elapsed (approximately 10000/s)
+    rate = progress.rate
+
+    assert_predicate rate, :positive?
+  end
+
+  def test_rate_with_zero_elapsed
+    # This is tricky to test since time always passes,
+    # but rate should handle edge cases gracefully
+    progress = Classifier::Streaming::Progress.new
+    rate = progress.rate
+    # Rate is 0 when completed is 0
+    assert_in_delta(0.0, rate)
+  end
+
+  def test_eta_with_known_total
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 50)
+    sleep 0.01 # Ensure some time passes
+    eta = progress.eta
+
+    assert_instance_of Float, eta
+    assert_operator eta, :>=, 0
+  end
+
+  def test_eta_with_unknown_total
+    progress = Classifier::Streaming::Progress.new(completed: 50)
+
+    assert_nil progress.eta
+  end
+
+  def test_eta_when_complete
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 100)
+    sleep 0.01
+
+    assert_in_delta(0.0, progress.eta)
+  end
+
+  def test_eta_with_zero_rate
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 0)
+
+    assert_nil progress.eta
+  end
+
+  def test_complete_when_finished
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 100)
+
+    assert_predicate progress, :complete?
+  end
+
+  def test_complete_when_not_finished
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 50)
+
+    refute_predicate progress, :complete?
+  end
+
+  def test_complete_with_unknown_total
+    progress = Classifier::Streaming::Progress.new(completed: 100)
+
+    refute_predicate progress, :complete?
+  end
+
+  def test_to_h
+    progress = Classifier::Streaming::Progress.new(total: 100, completed: 50)
+    hash = progress.to_h
+
+    assert_equal 50, hash[:completed]
+    assert_equal 100, hash[:total]
+    assert_in_delta(50.0, hash[:percent])
+    assert_instance_of Float, hash[:elapsed]
+    assert_instance_of Float, hash[:rate]
+  end
+
+  def test_completed_is_mutable
+    progress = Classifier::Streaming::Progress.new(total: 100)
+
+    assert_equal 0, progress.completed
+
+    progress.completed = 25
+
+    assert_equal 25, progress.completed
+
+    progress.completed += 25
+
+    assert_equal 50, progress.completed
+  end
+
+  def test_current_batch_tracking
+    progress = Classifier::Streaming::Progress.new(total: 100)
+
+    assert_equal 0, progress.current_batch
+
+    progress.current_batch = 5
+
+    assert_equal 5, progress.current_batch
+  end
+end
+
+class LineReaderTest < Minitest::Test
+  def test_each_line
+    io = StringIO.new("line1\nline2\nline3\n")
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    lines = reader.each.to_a
+
+    assert_equal %w[line1 line2 line3], lines
+  end
+
+  def test_each_line_removes_trailing_newlines
+    io = StringIO.new("line1\nline2\r\nline3")
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    lines = reader.each.to_a
+    # chomp removes both \n and \r\n
+    assert_equal %w[line1 line2 line3], lines
+  end
+
+  def test_each_with_block
+    io = StringIO.new("a\nb\nc\n")
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    collected = reader.map(&:upcase)
+
+    assert_equal %w[A B C], collected
+  end
+
+  def test_enumerable_methods
+    io = StringIO.new("1\n2\n3\n")
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    # LineReader includes Enumerable
+    assert_equal %w[1 2 3], reader.first(3)
+  end
+
+  def test_each_batch_default_size
+    lines = (1..250).map { |i| "line#{i}" }.join("\n")
+    io = StringIO.new(lines)
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    batches = reader.each_batch.to_a
+
+    assert_equal 3, batches.size
+    assert_equal 100, batches[0].size
+    assert_equal 100, batches[1].size
+    assert_equal 50, batches[2].size
+  end
+
+  def test_each_batch_custom_size
+    lines = (1..25).map { |i| "line#{i}" }.join("\n")
+    io = StringIO.new(lines)
+    reader = Classifier::Streaming::LineReader.new(io, batch_size: 10)
+
+    batches = reader.each_batch.to_a
+
+    assert_equal 3, batches.size
+    assert_equal 10, batches[0].size
+    assert_equal 10, batches[1].size
+    assert_equal 5, batches[2].size
+  end
+
+  def test_each_batch_with_block
+    io = StringIO.new("a\nb\nc\nd\ne\n")
+    reader = Classifier::Streaming::LineReader.new(io, batch_size: 2)
+
+    collected = []
+    reader.each_batch { |batch| collected << batch }
+
+    assert_equal [%w[a b], %w[c d], ['e']], collected
+  end
+
+  def test_each_batch_exact_multiple
+    lines = (1..10).map { |i| "line#{i}" }.join("\n")
+    io = StringIO.new(lines)
+    reader = Classifier::Streaming::LineReader.new(io, batch_size: 5)
+
+    batches = reader.each_batch.to_a
+
+    assert_equal 2, batches.size
+    assert_equal 5, batches[0].size
+    assert_equal 5, batches[1].size
+  end
+
+  def test_each_batch_empty_io
+    io = StringIO.new('')
+    reader = Classifier::Streaming::LineReader.new(io)
+
+    batches = reader.each_batch.to_a
+
+    assert_empty batches
+  end
+
+  def test_batch_size_reader
+    reader = Classifier::Streaming::LineReader.new(StringIO.new(''), batch_size: 50)
+
+    assert_equal 50, reader.batch_size
+  end
+
+  def test_estimate_line_count_with_seekable_io
+    # Create a file-like StringIO
+    lines = "short\nmedium line\nlonger line here\n"
+    io = StringIO.new(lines)
+
+    reader = Classifier::Streaming::LineReader.new(io)
+    estimate = reader.estimate_line_count(sample_size: 3)
+
+    # Should be close to 3 lines
+    assert_instance_of Integer, estimate
+    assert_predicate estimate, :positive?
+  end
+
+  def test_estimate_line_count_preserves_position
+    lines = "a\nb\nc\nd\ne\n"
+    io = StringIO.new(lines)
+    io.gets # Read one line, position is now after "a\n"
+    original_pos = io.pos
+
+    reader = Classifier::Streaming::LineReader.new(io)
+    reader.estimate_line_count
+
+    assert_equal original_pos, io.pos
+  end
+end
+
+class StreamingModuleTest < Minitest::Test
+  def test_default_batch_size
+    assert_equal 100, Classifier::Streaming::DEFAULT_BATCH_SIZE
+  end
+
+  class DummyClassifier
+    include Classifier::Streaming
+
+    attr_accessor :storage
+  end
+
+  def test_train_from_stream_raises_not_implemented
+    classifier = DummyClassifier.new
+    io = StringIO.new("test\n")
+
+    assert_raises(NotImplementedError) do
+      classifier.train_from_stream(:category, io)
+    end
+  end
+
+  def test_train_batch_raises_not_implemented
+    classifier = DummyClassifier.new
+
+    assert_raises(NotImplementedError) do
+      classifier.train_batch(:category, %w[doc1 doc2])
+    end
+  end
+
+  def test_save_checkpoint_requires_storage
+    classifier = DummyClassifier.new
+
+    assert_raises(ArgumentError) do
+      classifier.save_checkpoint('test')
+    end
+  end
+
+  def test_list_checkpoints_requires_storage
+    classifier = DummyClassifier.new
+
+    assert_raises(ArgumentError) do
+      classifier.list_checkpoints
+    end
+  end
+
+  def test_delete_checkpoint_requires_storage
+    classifier = DummyClassifier.new
+
+    assert_raises(ArgumentError) do
+      classifier.delete_checkpoint('test')
+    end
+  end
+end
+
+class StreamingEnforcementTest < Minitest::Test
+  # Dynamically discover all classifier/vectorizer classes by looking for
+  # classes that define `classify` (classifiers) or `transform` (vectorizers)
+  CLASSIFIERS = Classifier.constants.filter_map do |const|
+    klass = Classifier.const_get(const)
+    next unless klass.is_a?(Class)
+
+    klass if klass.method_defined?(:classify) || klass.method_defined?(:transform)
+  end.freeze
+
+  STREAMING_METHODS = %i[
+    train_from_stream
+    train_batch
+    save_checkpoint
+    list_checkpoints
+    delete_checkpoint
+  ].freeze
+
+  def test_classifiers_discovered
+    assert_operator CLASSIFIERS.size, :>=, 5, "Expected at least 5 classifiers, found: #{CLASSIFIERS.map(&:name)}"
+  end
+
+  CLASSIFIERS.each do |klass|
+    define_method("test_#{klass.name.split('::').last.downcase}_includes_streaming") do
+      assert_includes klass, Classifier::Streaming,
+                      "#{klass} must include Classifier::Streaming"
+    end
+
+    STREAMING_METHODS.each do |method|
+      define_method("test_#{klass.name.split('::').last.downcase}_responds_to_#{method}") do
+        assert klass.method_defined?(method) || klass.private_method_defined?(method),
+               "#{klass} must respond to #{method}"
+      end
+    end
+  end
+end