feat: add pluggable persistence backends with storage API#93
feat: add pluggable persistence backends with storage API#93
Conversation
Add gcov/lcov-based coverage for the native C extension. Coverage is compiled with -O0 --coverage flags and generates HTML reports showing line and function coverage for vector.c, matrix.c, svd.c, and classifier_ext.c. New rake tasks: - coverage:clean - remove gcov data files - coverage:compile - build with coverage instrumentation - coverage:report - generate HTML report via lcov - coverage:c - full cycle (clean, compile, test, report) CI uploads coverage reports as artifacts on each run.
Implement ActiveRecord-style storage abstraction allowing classifiers to persist to any backend. Users can configure storage at setup time and use simple save/reload methods without path arguments. Key additions: - Storage::Base abstract class defining write/read/delete/exists? protocol - Storage::Memory for testing and ephemeral use - Storage::File for file-based persistence - Dirty tracking with dirty? method on Bayes and LSI - reload raises UnsavedChangesError, reload! forces discard - load(storage:) class method sets storage on returned instance Legacy API preserved: - save_to_file(path) replaces old save(path) - load_from_file(path) replaces old load(path) Closes #90
Greptile Summary
Important Files Changed
Confidence score: 3/5
De-AI Code Simplifications: Remove from # Before (lines 25-46)
# Save classifier data
# @rbs (String) -> void
def write(data)
raise NotImplementedError, "#{self.class}#write must be implemented"
end
# Load classifier data
# @rbs () -> String?
def read
raise NotImplementedError, "#{self.class}#read must be implemented"
end
# Delete classifier data
# @rbs () -> void
def delete
raise NotImplementedError, "#{self.class}#delete must be implemented"
end
# Check if data exists
# @rbs () -> bool
def exists?
raise NotImplementedError, "#{self.class}#exists? must be implemented"
end
# After - remove redundant comments, keep @rbs for type checking
# @rbs (String) -> void
def write(data)
raise NotImplementedError, "#{self.class}#write must be implemented"
end
# @rbs () -> String?
def read
raise NotImplementedError, "#{self.class}#read must be implemented"
end
# @rbs () -> void
def delete
raise NotImplementedError, "#{self.class}#delete must be implemented"
end
# @rbs () -> bool
def exists?
raise NotImplementedError, "#{self.class}#exists? must be implemented"
endRemove from # Before (lines 31-48)
# Save data to file
# @rbs (String) -> Integer
def write(data)
::File.write(@path, data)
end
# Load data from file
# @rbs () -> String?
def read
exists? ? ::File.read(@path) : nil
end
# Delete file
# @rbs () -> void
def delete
::File.delete(@path) if exists?
end
# Check if file exists
# @rbs () -> bool
def exists?
::File.exist?(@path)
end
# After - remove obvious comments
# @rbs (String) -> Integer
def write(data)
::File.write(@path, data)
end
# @rbs () -> String?
def read
exists? ? ::File.read(@path) : nil
end
# @rbs () -> void
def delete
::File.delete(@path) if exists?
end
# @rbs () -> bool
def exists?
::File.exist?(@path)
endCritical Race Condition Fix needed in # Before (lines 173-183) - BROKEN
def remove_item(item)
synchronize do
return false unless @items.include?(item)
@items.delete(item)
@dirty = true
end
build_index if auto_rebuild
true
end
# After - FIX: move build_index inside sync block
def remove_item(item)
synchronize do
return false unless @items.include?(item)
@items.delete(item)
@dirty = true
build_index if auto_rebuild
end
true
endSequence DiagramsequenceDiagram
participant User
participant Storage as "Storage::File"
participant Classifier as "Classifier::Bayes"
participant FileSystem as "File System"
User->>Storage: "new(path: 'model.json')"
Storage-->>User: "storage instance"
User->>Classifier: "new('Spam', 'Ham')"
Classifier-->>User: "classifier instance"
User->>Classifier: "storage = storage"
User->>Classifier: "train_spam('buy now')"
Classifier->>Classifier: "set @dirty = true"
User->>Classifier: "train_ham('hello friend')"
Classifier->>Classifier: "set @dirty = true"
User->>Classifier: "save()"
Classifier->>Classifier: "to_json()"
Classifier->>Storage: "write(json_data)"
Storage->>FileSystem: "File.write(path, data)"
FileSystem-->>Storage: "bytes_written"
Storage-->>Classifier: "bytes_written"
Classifier->>Classifier: "set @dirty = false"
Classifier-->>User: "void"
User->>Classifier: "load(storage: storage)"
Classifier->>Storage: "read()"
Storage->>FileSystem: "File.read(path)"
FileSystem-->>Storage: "json_data"
Storage-->>Classifier: "json_data"
Classifier->>Classifier: "from_json(json_data)"
Classifier->>Classifier: "storage = storage"
Classifier-->>User: "loaded_classifier"
|
|
The This project uses |
|
@greptile review |
There was a problem hiding this comment.
Additional Comments (1)
-
lib/classifier/lsi.rb, line 173-181 (link)logic:
remove_itemdoesn't rebuild the index when@auto_rebuildis true, butadd_itemdoes (line 146). This inconsistency means removing items won't update the index automatically, causing stale results.
15 files reviewed, 1 comment
|
@greptile review |
| def remove_item(item) | ||
| synchronize do | ||
| return unless @items.key?(item) | ||
| removed = synchronize do | ||
| next false unless @items.key?(item) | ||
|
|
||
| @items.delete(item) | ||
| @version += 1 | ||
| @dirty = true | ||
| true | ||
| end | ||
| build_index if removed && @auto_rebuild | ||
| end |
There was a problem hiding this comment.
logic: Critical race condition: build_index call occurs outside synchronize block. If another thread modifies @Items between line 181 and 182, the index rebuild may use inconsistent state.
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 173:183
Comment:
**logic:** Critical race condition: `build_index` call occurs outside `synchronize` block. If another thread modifies @items between line 181 and 182, the index rebuild may use inconsistent state.
How can I resolve this? If you propose a fix, please make it concise.|
All review comments have been addressed:
All CI checks are passing. |
Summary
dirty?tracking to detect unsaved changesreload/reload!methods to refresh classifiers from storagesave/loadtosave_to_file/load_from_file(legacy API preserved)Storage API
Classifiers can now be configured with a storage backend:
Custom Backends
Implement
Classifier::Storage::Baseto create custom storage (Redis, S3, PostgreSQL, etc.):Dirty Tracking
Reload
Test plan
save_to_file/load_from_fileAPI still works