feat: add thread safety to Bayes and LSI classifiers#86
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds thread safety to the Bayes and LSI classifiers by incorporating the Mutex_m mixin and wrapping all public methods that access shared state in synchronize blocks. The implementation follows a pattern that prevents deadlock by creating unlocked private versions of methods that are called within synchronized blocks.
Key Changes:
- Added
Mutex_minclusion and initialization to both Bayes and LSI classes - Wrapped all public methods accessing shared state in
synchronizeblocks - Implemented custom
marshal_dump/marshal_loadmethods to handle serialization (Mutex objects cannot be marshaled)
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| lib/classifier/lsi.rb | Added thread safety via Mutex_m, refactored methods into public synchronized versions and private unlocked versions, implemented custom marshaling |
| lib/classifier/bayes.rb | Added thread safety via Mutex_m, wrapped state-accessing methods in synchronize blocks, implemented custom marshaling |
Comments suppressed due to low confidence (1)
lib/classifier/bayes.rb:1
- Missing closing
endfor thesynchronizeblock. The block opened on line 193 is not properly closed before the method ends on line 202.
# rbs_inline: enabled
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Both classifiers now use Mutex_m for thread-safe operations. Concurrent train() and classify() calls no longer corrupt internal state. Key changes: - Include Mutex_m in both Bayes and LSI classes - Wrap all public methods accessing shared state in synchronize blocks - Add unlocked private variants for internal calls to avoid deadlock - Implement marshal_dump/marshal_load to handle serialization (Mutex cannot be marshaled) The pattern used avoids deadlock by having public methods delegate to unlocked private methods when the lock is already held. Closes #66
The rbs-inline tool generates 'include Mutex_m' in the type signatures for Bayes and LSI classes, but RBS validation fails because it cannot find the Mutex_m type definition. Add vendor type stubs with the core mutex methods and their standard aliases (synchronize, lock, unlock, etc.) to satisfy both RBS validation and Steep type checking.
Extract GSL and native matrix vector assignment into helper methods to reduce build_index method length below the 25-line threshold. Use anonymous block forwarding (&) per Ruby 3.1+ style preferences for the unlocked proxy methods.
2c4d708 to
265dbe1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Custom marshal deserialization to recreate mutex | ||
| # @rbs (Array[untyped]) -> void | ||
| def marshal_load(data) | ||
| mu_initialize |
There was a problem hiding this comment.
Calling mu_initialize directly may not properly initialize the mutex state. Consider using super() or the appropriate Mutex_m initialization method to ensure the mutex is correctly set up during deserialization.
| mu_initialize | |
| @mu = Mutex.new |
| # Custom marshal deserialization to recreate mutex | ||
| # @rbs (Array[untyped]) -> void | ||
| def marshal_load(data) | ||
| mu_initialize |
There was a problem hiding this comment.
Calling mu_initialize directly may not properly initialize the mutex state. Consider using super() or the appropriate Mutex_m initialization method to ensure the mutex is correctly set up during deserialization.
| mu_initialize | |
| @mutex = Mutex.new |
|
Regarding the Copilot suggestion to replace The current implementation using
Verified that serialization/deserialization works correctly: b = Classifier::Bayes.new('Spam', 'Ham')
b.train_spam('buy now')
loaded = Marshal.load(Marshal.dump(b))
loaded.train_spam('test') # Works correctly |
Summary
Mutex_minclusion to both Bayes and LSI classifierssynchronizeblocksmarshal_dump/marshal_loadfor serialization (Mutex can't be marshaled)Problem
Concurrent
train()andclassify()calls corrupt internal state:Solution
Use the
Mutex_mmixin (already a gem dependency) with a pattern that avoids deadlock:Test plan
bundle exec rake testNATIVE_VECTOR=true bundle exec rake testtest_serialize_safe)Closes #66