Skip to content

Commit 0f9f55b

Browse files
authored
Merge pull request #241 from ruvnet/feat/ruvllm-wasm-publish
feat: ruvllm-wasm v2.0.0 — first functional WASM publish
2 parents 55b9ab3 + 377871f commit 0f9f55b

File tree

3 files changed

+102
-39
lines changed

3 files changed

+102
-39
lines changed

Cargo.lock

Lines changed: 0 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/ruvllm-wasm/Cargo.toml

Lines changed: 19 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ description = "WASM bindings for RuvLLM - browser-compatible LLM inference runti
1010
keywords = ["wasm", "llm", "inference", "browser", "webgpu"]
1111
categories = ["wasm", "api-bindings", "web-programming"]
1212

13+
[package.metadata.wasm-pack.profile.release]
14+
wasm-opt = false
15+
1316
[lib]
1417
crate-type = ["cdylib", "rlib"]
1518

@@ -18,12 +21,12 @@ crate-type = ["cdylib", "rlib"]
1821
wasm-bindgen = "0.2"
1922
wasm-bindgen-futures = "0.4"
2023
js-sys = "0.3"
24+
# Core web-sys features (always needed)
2125
web-sys = { version = "0.3", features = [
2226
"console",
2327
"Performance",
2428
"Window",
2529
"Navigator",
26-
# Web Workers support (enabled with parallel feature)
2730
"Worker",
2831
"WorkerOptions",
2932
"WorkerType",
@@ -33,39 +36,6 @@ web-sys = { version = "0.3", features = [
3336
"MessageEvent",
3437
"ErrorEvent",
3538
"DedicatedWorkerGlobalScope",
36-
# WebGPU features (enabled with webgpu feature)
37-
"Gpu",
38-
"GpuAdapter",
39-
"GpuAdapterInfo",
40-
"GpuDevice",
41-
"GpuQueue",
42-
"GpuBuffer",
43-
"GpuBufferDescriptor",
44-
"GpuShaderModule",
45-
"GpuShaderModuleDescriptor",
46-
"GpuBindGroup",
47-
"GpuBindGroupDescriptor",
48-
"GpuBindGroupEntry",
49-
"GpuBindGroupLayout",
50-
"GpuBindGroupLayoutDescriptor",
51-
"GpuBindGroupLayoutEntry",
52-
"GpuBufferBinding",
53-
"GpuBufferBindingLayout",
54-
"GpuBufferBindingType",
55-
"GpuComputePipeline",
56-
"GpuComputePipelineDescriptor",
57-
"GpuPipelineLayout",
58-
"GpuPipelineLayoutDescriptor",
59-
"GpuProgrammableStage",
60-
"GpuCommandEncoder",
61-
"GpuCommandEncoderDescriptor",
62-
"GpuCommandBuffer",
63-
"GpuComputePassEncoder",
64-
"GpuComputePassDescriptor",
65-
"gpu_map_mode",
66-
"GpuRequestAdapterOptions",
67-
"GpuDeviceDescriptor",
68-
"GpuSupportedLimits",
6939
] }
7040

7141
# Serialization
@@ -76,16 +46,27 @@ serde_json = "1.0"
7646
# Error handling
7747
console_error_panic_hook = { version = "0.1", optional = true }
7848

79-
# Byte casting for GPU buffers
80-
bytemuck = { version = "1.14", features = ["derive"] }
8149

8250
[dev-dependencies]
8351
wasm-bindgen-test = "0.3"
8452

8553
[features]
8654
default = ["console_error_panic_hook"]
87-
# WebGPU acceleration
88-
webgpu = []
55+
# WebGPU acceleration (adds GPU compute pipeline, shader compilation, buffer management)
56+
webgpu = ["web-sys/Gpu", "web-sys/GpuAdapter", "web-sys/GpuAdapterInfo",
57+
"web-sys/GpuDevice", "web-sys/GpuQueue", "web-sys/GpuBuffer",
58+
"web-sys/GpuBufferDescriptor", "web-sys/GpuShaderModule",
59+
"web-sys/GpuShaderModuleDescriptor", "web-sys/GpuBindGroup",
60+
"web-sys/GpuBindGroupDescriptor", "web-sys/GpuBindGroupEntry",
61+
"web-sys/GpuBindGroupLayout", "web-sys/GpuBindGroupLayoutDescriptor",
62+
"web-sys/GpuBindGroupLayoutEntry", "web-sys/GpuBufferBinding",
63+
"web-sys/GpuBufferBindingLayout", "web-sys/GpuBufferBindingType",
64+
"web-sys/GpuComputePipeline", "web-sys/GpuComputePipelineDescriptor",
65+
"web-sys/GpuPipelineLayout", "web-sys/GpuPipelineLayoutDescriptor",
66+
"web-sys/GpuProgrammableStage", "web-sys/GpuCommandEncoder",
67+
"web-sys/GpuCommandEncoderDescriptor", "web-sys/GpuCommandBuffer",
68+
"web-sys/GpuComputePassEncoder", "web-sys/GpuComputePassDescriptor",
69+
"web-sys/GpuRequestAdapterOptions", "web-sys/GpuDeviceDescriptor"]
8970
# Enable parallel inference with Web Workers
9071
parallel = []
9172
# Enable SIMD optimizations (requires wasm-simd target feature)
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# ADR-084: ruvllm-wasm — First Functional npm Publish
2+
3+
**Status**: Accepted
4+
**Date**: 2026-03-06
5+
**Authors**: RuVector Team
6+
**Deciders**: ruv
7+
**Related**: ADR-083 (Brain Training Loops), Issue #238 (placeholder deprecation)
8+
9+
## 1. Context
10+
11+
The `@ruvector/ruvllm-wasm` npm package (v0.1.0) was a placeholder — published without compiled WASM binaries. It was deprecated in PR #239. Meanwhile, the Rust crate `ruvllm-wasm` (v2.0.0) contains substantial working code:
12+
13+
| Subsystem | Status | Exports |
14+
|-----------|--------|---------|
15+
| KV Cache (two-tier FP32+u8) | Working | `KvCacheWasm`, `KvCacheConfigWasm` |
16+
| Memory (arena + buffer pool) | Working | `InferenceArenaWasm`, `BufferPoolWasm` |
17+
| Chat Templates (7 formats) | Working | `ChatTemplateWasm`, `ChatMessageWasm` |
18+
| HNSW Semantic Router | Working | `HnswRouterWasm`, `PatternWasm`, `RouteResultWasm` |
19+
| MicroLoRA (rank 1-4) | Working | `MicroLoraWasm`, `AdaptFeedbackWasm` |
20+
| SONA Instant Learning | Working | `SonaInstantWasm`, `SonaConfigWasm` |
21+
| Web Workers | Working | `ParallelInference`, feature detection |
22+
| WebGPU (matmul shader) | Feature-gated | `WebGpuInference`, `WebGpuContext` |
23+
| IntelligentLLM (combined) | Commented out | Pending API compatibility |
24+
25+
## 2. Decision
26+
27+
### 2.1 Fix WASM Build
28+
29+
The Rust 1.91 compiler has a codegen bug where release-profile optimizations produce invalid WASM (type mismatch: `expected i32, found f64` in wasm-bindgen post-processing). Debug builds validate fine.
30+
31+
**Workaround**: Build with `codegen-units=256` + `lto=off`. This prevents cross-function optimization passes that trigger the bug while still producing optimized output.
32+
33+
```bash
34+
CARGO_PROFILE_RELEASE_CODEGEN_UNITS=256 \
35+
CARGO_PROFILE_RELEASE_LTO=off \
36+
wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --release
37+
```
38+
39+
Added `wasm-opt = false` to `[package.metadata.wasm-pack.profile.release]` since wasm-opt's validator also rejects the binary.
40+
41+
### 2.2 Gate WebGPU Features
42+
43+
WebGPU `web-sys` features (`gpu_map_mode`, `GpuSupportedLimits`, 28 GPU types) were compiled unconditionally, inflating binary size. Moved all GPU web-sys features behind the `webgpu` Cargo feature flag.
44+
45+
Removed unused `bytemuck` dependency and `gpu_map_mode` / `GpuSupportedLimits` (declared but never referenced in source).
46+
47+
### 2.3 Publish as v2.0.0
48+
49+
Published `@ruvector/ruvllm-wasm@2.0.0` to npm with:
50+
- Compiled WASM binary (~435 KB, ~150 KB gzipped)
51+
- TypeScript definitions (`.d.ts`)
52+
- ES module JS glue code
53+
- Accurate README with working API examples
54+
55+
### 2.4 README
56+
57+
Replaced placeholder README with accurate documentation covering all exported types, working code examples, and browser compatibility table.
58+
59+
## 3. Files Modified
60+
61+
| File | Changes |
62+
|------|---------|
63+
| `crates/ruvllm-wasm/Cargo.toml` | Gate WebGPU features, remove unused bytemuck/gpu_map_mode/GpuSupportedLimits, add wasm-opt=false |
64+
| `crates/ruvllm-wasm/pkg/README.md` | Complete rewrite with accurate API docs |
65+
| `crates/ruvllm-wasm/pkg/` | Generated: `.wasm`, `.js`, `.d.ts` files |
66+
67+
## 4. Build Artifact Details
68+
69+
| File | Size |
70+
|------|------|
71+
| `ruvllm_wasm_bg.wasm` | 435 KB |
72+
| `ruvllm_wasm.js` | 128 KB |
73+
| `ruvllm_wasm.d.ts` | 45 KB |
74+
75+
## 5. Known Limitations
76+
77+
| Area | Limitation | Resolution Path |
78+
|------|-----------|-----------------|
79+
| Rust 1.91 codegen bug | Requires `codegen-units=256` workaround | Fixed in future Rust compiler release |
80+
| IntelligentLLMWasm | Commented out, references non-existent `HnswRouterConfigWasm` | Create config struct or pass params directly |
81+
| WebGPU attention | CPU fallback only (matmul has GPU path) | Implement attention WGSL shader pipeline |
82+
| Worker pool | Uses `setTimeout` polling instead of proper task completion signals | Implement message-based completion tracking |
83+
| GGUF model loading | Not yet wired (no `load_model_from_url`) | Requires streaming fetch + parser integration |

0 commit comments

Comments
 (0)