Skip to content

Comments

GPU-native AtomSpace Phase 2-3: pair counting + MI computation#4

Open
plankatron wants to merge 1 commit intoopencog:masterfrom
plankatron:feat/gpu-counting-mi
Open

GPU-native AtomSpace Phase 2-3: pair counting + MI computation#4
plankatron wants to merge 1 commit intoopencog:masterfrom
plankatron:feat/gpu-counting-mi

Conversation

@plankatron
Copy link
Contributor

@plankatron plankatron commented Feb 9, 2026

Summary

GPU kernels for the statistical learning stage — counting word co-occurrences and computing mutual information.

  • gpu-counting.cl — Sentence pair counting with sliding-window approach. Processes tokenized sentences to extract word co-occurrence pairs using atomic increments for concurrent sentence processing.
  • gpu-mi.cl — Mutual Information computation with two modes:
    • Resident mode: full recomputation over all pairs
    • Dirty mode: incremental updates for recently changed pairs only
    • Uses double precision to match AtomSpace FloatValue format

Pipeline Position

This is Phase 2-3 of the GPU-native AtomSpace pipeline:

 Phase 0-1: Hash Table + Pools (#3, dependency)
[Phase 2-3: Counting + MI] ← THIS PR
 Phase 4-5: Sections + Cosine  (#5)
 Phase 6:   Substitution       (#6)

Depends on PR #3 — these kernels reference WordPool, PairPool, and hash table structs defined in gpu-atomspace.cl and gpu-hashtable.cl. PR #3 should be merged first. No dependencies on PRs #5 or #6.

Files

File Lines Description
gpu-counting.cl 296 Sliding-window pair counting kernel
gpu-mi.cl 224 MI computation (resident + dirty modes)
test-counting.c 720 Pair counting unit tests
test-mi.c 961 MI computation unit tests

Performance

  • MI throughput: 385K pairs/sec (1.24M pairs in ~3.2 seconds)
  • Dirty mode avoids recomputing unchanged pairs, enabling incremental updates

Building & Testing

cd opencog/opencl/atomspace

# Pair counting tests
gcc -o test-counting test-counting.c -lOpenCL -lm
./test-counting

# MI computation tests
gcc -o test-mi test-mi.c -lOpenCL -lm
./test-mi

Relationship to Prior Work

Builds on PR #2 (binary caching + dispatch thread build), which was merged as 2a9c544. These kernels benefit from the binary caching infrastructure for faster program load times.

🤖 Generated with Claude Code

Phase 2: Sentence pair counting with sliding-window approach.
Processes tokenized sentences to extract word co-occurrence pairs,
using atomic increments for concurrent sentence processing.

Phase 3: Mutual Information computation with two modes:
- Resident mode: full recomputation over all pairs
- Dirty mode: incremental updates for recently changed pairs
Uses double precision to match AtomSpace FloatValue format.

Includes comprehensive C test suites for both components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant