Skip to content

feat: add thread safety to Bayes and LSI classifiers#86

Merged
cardmagic merged 3 commits intomasterfrom
add-thread-safety
Dec 28, 2025
Merged

feat: add thread safety to Bayes and LSI classifiers#86
cardmagic merged 3 commits intomasterfrom
add-thread-safety

Conversation

@cardmagic
Copy link
Owner

Summary

  • Add Mutex_m inclusion to both Bayes and LSI classifiers
  • Wrap all public methods accessing shared state in synchronize blocks
  • Add marshal_dump/marshal_load for serialization (Mutex can't be marshaled)

Problem

Concurrent train() and classify() calls corrupt internal state:

classifier = Classifier::Bayes.new('Spam', 'Ham')

threads = 10.times.map do
  Thread.new do
    100.times do
      classifier.train_spam("buy now cheap")
      classifier.classify("special offer")
    end
  end
end

threads.each(&:join)  # Corrupted state

Solution

Use the Mutex_m mixin (already a gem dependency) with a pattern that avoids deadlock:

def public_method(...)
  synchronize { private_unlocked_method(...) }
end

private

def private_unlocked_method(...)
  # actual implementation
end

Test plan

  • All existing tests pass with bundle exec rake test
  • All tests pass with NATIVE_VECTOR=true bundle exec rake test
  • Marshal serialization still works (test_serialize_safe)

Closes #66

@cardmagic cardmagic requested a review from Copilot December 28, 2025 15:50
@cardmagic cardmagic self-assigned this Dec 28, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds thread safety to the Bayes and LSI classifiers by incorporating the Mutex_m mixin and wrapping all public methods that access shared state in synchronize blocks. The implementation follows a pattern that prevents deadlock by creating unlocked private versions of methods that are called within synchronized blocks.

Key Changes:

  • Added Mutex_m inclusion and initialization to both Bayes and LSI classes
  • Wrapped all public methods accessing shared state in synchronize blocks
  • Implemented custom marshal_dump/marshal_load methods to handle serialization (Mutex objects cannot be marshaled)

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
lib/classifier/lsi.rb Added thread safety via Mutex_m, refactored methods into public synchronized versions and private unlocked versions, implemented custom marshaling
lib/classifier/bayes.rb Added thread safety via Mutex_m, wrapped state-accessing methods in synchronize blocks, implemented custom marshaling
Comments suppressed due to low confidence (1)

lib/classifier/bayes.rb:1

  • Missing closing end for the synchronize block. The block opened on line 193 is not properly closed before the method ends on line 202.
# rbs_inline: enabled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Both classifiers now use Mutex_m for thread-safe operations. Concurrent
train() and classify() calls no longer corrupt internal state.

Key changes:
- Include Mutex_m in both Bayes and LSI classes
- Wrap all public methods accessing shared state in synchronize blocks
- Add unlocked private variants for internal calls to avoid deadlock
- Implement marshal_dump/marshal_load to handle serialization (Mutex
  cannot be marshaled)

The pattern used avoids deadlock by having public methods delegate to
unlocked private methods when the lock is already held.

Closes #66
The rbs-inline tool generates 'include Mutex_m' in the type
signatures for Bayes and LSI classes, but RBS validation fails
because it cannot find the Mutex_m type definition.

Add vendor type stubs with the core mutex methods and their
standard aliases (synchronize, lock, unlock, etc.) to satisfy
both RBS validation and Steep type checking.
Extract GSL and native matrix vector assignment into helper methods
to reduce build_index method length below the 25-line threshold.

Use anonymous block forwarding (&) per Ruby 3.1+ style preferences
for the unlocked proxy methods.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Custom marshal deserialization to recreate mutex
# @rbs (Array[untyped]) -> void
def marshal_load(data)
mu_initialize
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling mu_initialize directly may not properly initialize the mutex state. Consider using super() or the appropriate Mutex_m initialization method to ensure the mutex is correctly set up during deserialization.

Suggested change
mu_initialize
@mu = Mutex.new

Copilot uses AI. Check for mistakes.
# Custom marshal deserialization to recreate mutex
# @rbs (Array[untyped]) -> void
def marshal_load(data)
mu_initialize
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling mu_initialize directly may not properly initialize the mutex state. Consider using super() or the appropriate Mutex_m initialization method to ensure the mutex is correctly set up during deserialization.

Suggested change
mu_initialize
@mutex = Mutex.new

Copilot uses AI. Check for mistakes.
@cardmagic
Copy link
Owner Author

Regarding the Copilot suggestion to replace mu_initialize with @mu = Mutex.new:

The current implementation using mu_initialize is correct. This is the proper Mutex_m API for reinitializing the mutex state after deserialization. The suggested @mu = Mutex.new or @mutex = Mutex.new would not work because:

  1. Mutex_m uses internal instance variables (not @mu or @mutex)
  2. The synchronize method relies on state set up by mu_initialize

Verified that serialization/deserialization works correctly:

b = Classifier::Bayes.new('Spam', 'Ham')
b.train_spam('buy now')
loaded = Marshal.load(Marshal.dump(b))
loaded.train_spam('test')  # Works correctly

@cardmagic cardmagic merged commit f9cb27c into master Dec 28, 2025
5 checks passed
@cardmagic cardmagic deleted the add-thread-safety branch December 28, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add thread safety to classifiers

2 participants