GPU-native AtomSpace Phase 2-3: pair counting + MI computation by plankatron · Pull Request #4 · opencog/atomese-simd

plankatron · 2026-02-09T14:22:46Z

Summary

GPU kernels for the statistical learning stage — counting word co-occurrences and computing mutual information.

gpu-counting.cl — Sentence pair counting with sliding-window approach. Processes tokenized sentences to extract word co-occurrence pairs using atomic increments for concurrent sentence processing.
gpu-mi.cl — Mutual Information computation with two modes:
- Resident mode: full recomputation over all pairs
- Dirty mode: incremental updates for recently changed pairs only
- Uses double precision to match AtomSpace FloatValue format

Pipeline Position

This is Phase 2-3 of the GPU-native AtomSpace pipeline:

 Phase 0-1: Hash Table + Pools (#3, dependency)
[Phase 2-3: Counting + MI] ← THIS PR
 Phase 4-5: Sections + Cosine  (#5)
 Phase 6:   Substitution       (#6)

Depends on PR #3 — these kernels reference WordPool, PairPool, and hash table structs defined in gpu-atomspace.cl and gpu-hashtable.cl. PR #3 should be merged first. No dependencies on PRs #5 or #6.

Files

File	Lines	Description
`gpu-counting.cl`	296	Sliding-window pair counting kernel
`gpu-mi.cl`	224	MI computation (resident + dirty modes)
`test-counting.c`	720	Pair counting unit tests
`test-mi.c`	961	MI computation unit tests

Performance

MI throughput: 385K pairs/sec (1.24M pairs in ~3.2 seconds)
Dirty mode avoids recomputing unchanged pairs, enabling incremental updates

Building & Testing

cd opencog/opencl/atomspace

# Pair counting tests
gcc -o test-counting test-counting.c -lOpenCL -lm
./test-counting

# MI computation tests
gcc -o test-mi test-mi.c -lOpenCL -lm
./test-mi

Relationship to Prior Work

Builds on PR #2 (binary caching + dispatch thread build), which was merged as 2a9c544. These kernels benefit from the binary caching infrastructure for faster program load times.

🤖 Generated with Claude Code

Phase 2: Sentence pair counting with sliding-window approach. Processes tokenized sentences to extract word co-occurrence pairs, using atomic increments for concurrent sentence processing. Phase 3: Mutual Information computation with two modes: - Resident mode: full recomputation over all pairs - Dirty mode: incremental updates for recently changed pairs Uses double precision to match AtomSpace FloatValue format. Includes comprehensive C test suites for both components. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Feb 9, 2026

GPU-native AtomSpace Phase 0-1: hash table + pool layouts #3

Open

GPU-native AtomSpace Phase 4-5: section extraction + cosine similarity #5

Open

GPU-native AtomSpace Phase 6: class substitution #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

GPU-native AtomSpace Phase 2-3: pair counting + MI computation#4

GPU-native AtomSpace Phase 2-3: pair counting + MI computation#4
plankatron wants to merge 1 commit intoopencog:masterfrom
plankatron:feat/gpu-counting-mi

plankatron commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

plankatron commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pipeline Position

Files

Performance

Building & Testing

Relationship to Prior Work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

plankatron commented Feb 9, 2026 •

edited

Loading