Releases · ashvardanian/NumKong

06 Apr 21:04

ashvardanian

v7.4.5

a750052

v7.4.5: Faster RMSD Latest

Latest

Improve: Vectorize F32 SME MaxSim finalizer (0daacf3)
Improve: Remove centering from RMSD kernels (1a83ab4)
Fix: Emulated vs native test durations (4266451)

Assets 16

numkong_android_arm32_7.4.5.zip

sha256:6c67aa355d3eb13452372def9b2bbc6a9fee8007af295284ecbcd6834fd2c2ce

3.1 MB 2026-04-06T21:23:49Z
numkong_android_arm64_7.4.5.zip

sha256:6d2b366646780305150906c38d031309a15562b4d4a3a7a009f6f9f228329be3

4.68 MB 2026-04-06T21:14:50Z
numkong_linux_amd64_7.4.5.deb

sha256:f7e254508d745ab25792be230d64856b959fb2e7643d18760c00f7bb28481cb0

1.64 MB 2026-04-06T21:24:06Z
numkong_linux_amd64_7.4.5.so

sha256:8856a4a67ab107366a26121d087ce09e75eafa76964a26918fb278f02b7c40d9

3.52 MB 2026-04-06T21:24:05Z
numkong_linux_arm64_7.4.5.deb

sha256:4183b0229f53058b37a6c37b8d7330aff4c1683ede40608196ce4ec50a996e3f

1.41 MB 2026-04-06T21:13:42Z
numkong_linux_arm64_7.4.5.so

sha256:13a5ae4098ccaa0d8d9f5fd1ca464d0187f1a1d77c17c42aec231098770bec5c

1.76 MB 2026-04-06T21:13:42Z
numkong_macos_arm64_7.4.5.dylib

sha256:18956c1798b96fdd8481da865d5425380dca2ca3b8432f86a9704aad42db6c40

1.83 MB 2026-04-06T21:08:26Z
numkong_macos_arm64_7.4.5.zip

sha256:af3a2e4a25af616370ba37cad29ff3e189565f15f6962d205653f5c24fecfdba

2.1 MB 2026-04-06T21:08:26Z
numkong_macos_x86_64_7.4.5.dylib

sha256:08b54e52f70e6d4d090b9ab410cb671f6e20c39f910cede12a33cf137e27fa77

2.2 MB 2026-04-06T21:09:51Z
numkong_macos_x86_64_7.4.5.zip

sha256:c3db1f93478c7a9b7abeb7b7bc5f6bc8d9d3427b6095457e802ffc6d9c9260a2

2.11 MB 2026-04-06T21:09:51Z
Source code (zip)

2026-04-06T21:04:11Z
Source code (tar.gz)

2026-04-06T21:04:11Z

06 Apr 12:17

ashvardanian

v7.4.4

b4ed3ae

v7.4.4: CI & MSVC Hardening

Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e6)
Make: check_source_runs-probing like march=native on MSVC (7a152f3)
Fix: Drop _MM_FROUND_NO_EXC from _mm256_cvtps_ph calls (8649b0c)
Fix: Guard against old MSVC preprocessor (25d3304)
Make: Enforce newer preprocessor in MSVC (be966af)
Make: Cleaner CIBW artifact names & env forwarding (a6cf642)
Make: Forward cross-compilation flags for macOS wheels (6ed3b8c)
Make: Split ppc64le, s390x, i686 CIBW runs (c01795c)

Assets 16

05 Apr 16:34

ashvardanian

v7.4.3

55fc1d8

Release v7.4.3

Release: v7.4.3 [skip ci]

Patch

Fix: Require AArch64 for NEON kernels (2ba1b34)
Docs: Table order & formatting (8673a56)
Make: Avoid --all-features in Rust cross-compilation CI (8be8bff)
Improve: Arm32 compatibility (6404172)
Make: cancel-in-progress CI to shift compute resources (dfc8fa0)
Improve: Harden Swift SDK for 6.1+ toolkit (965cd52)
Make: Strip .unsafeFlags & list platforms for SPM consumption (b061b78)
Make: Expose CNumKongDispatch target to Swift users (6aa00a8)

Assets 16

05 Apr 09:07

ashvardanian

v7.4.2

0f2783c

Release v7.4.2

Release: v7.4.2 [skip ci]

Patch

Docs: Shrink tables in the main README (6d2ea34)
Make: Inline Power Shell cross-compilation logic in CI (974c30c)
Make: Define _ARM64_ for Arm JS builds in MSVC (f303042)
Make: Skip same-named artifacts on CI reruns (7c098e5)

Assets 16

05 Apr 00:12

ashvardanian

v7.4.1

c360304

Release v7.4.1

Release: v7.4.1 [skip ci]

Patch

Make: Set repository.url for NPM (385480d)
Make: Pull MSVC ARM64 Cross-Compiler (e20c93e)
Fix: Swap f16x8 for u16x8 in cast_neon (154ec5d)

Assets 16

04 Apr 23:26

ashvardanian

v7.4.0

ffc6e74

v7.4: Fast Tensor Contractions

Faster tensor contractions
Faster GEMM "packers" with SIMD
New SVE+SDOT kernels for i8
MSVC build stability on Arm

Minor

Add: WASM elementwise ops & spatial mini-float kernels (81b8c44)
Add: WASM type-casting kernels (e09df31)
Add: SVE+SDOT ops for 8-bit integers (913fc6b)

Patch

Fix: Misplaced NEON loads/stores in Sierra (05e3045)
Fix: Avoid unconsitional np symbols (9dffb68)
Make: Resolve probe locations for NPM consumers (c602f45)
Docs: Refined "What's Inside" (28f35cd)
Docs: Mini-float kernel selection strategy (04e6598)
Improve: Accelerate PyTests, reduce Decimal use (2417248)
Make: Move .pyi for PyLance (688ec2d)
Fix: Inconsistent SME function qualifiers (5b4148a)
Improve: Smaller test inputs under QEMU (ee36bf2)
Improve: Vectorize GEMM "packers" (86127a4)
Make: Longer timeouts for QEMU in CI (a9cc732)
Fix: vec_t store helper args order (eecbcac)
Fix: Negative stride tensor reductions (3ea81be)
Improve: Recursive stride collapsing and axis-lane fast paths for N-D reductions (cf8eaf6)
Improve: Faster reductions in strided tensors (61651ed)
Improve: Wider NEON curved, mesh, & probability F16 kernels (1c17678)
Fix: Harden mini-float type-casting (1911b89)
Make: Ship win32-arm64 NPM builds (578b7ad)
Make: Auto-bump JS platform-specific versions (5617f75)
Fix: vcombine instead of initializer lists for NEON arrays in MSVC (906c178)
Fix: Avoid flaky vld1_f16 for MSVC (7a987d2)

Assets 15

02 Apr 22:48

ashvardanian

v7.3.0

9d58663

v7.3: Hardened Arm Kernels, Upgraded CI, Citations, & Docs

This release hardens Arm kernels across NEON, SVE, and SME. The most widespread fix replaces _x (don't-care) predicated intrinsics with _m (merge-with-zero) variants — inactive lanes left undefined by _x could carry stale data into reductions, producing wrong results for non-power-of-two dimensions on real SVE hardware. Partial-tail padding in BMOPA is fixed for sub-32-bit types, and strided reductions in NEON are hardened against off-by-one in non-contiguous layouts.

Thanks to the @ClickHouse team for help hardening tail loads and @albumentations-team for strided reductions!

On the performance side, NEON gets faster in-vector finalizers, vcvt_high for cheaper F16/BF16 widening, and new SDOT fallbacks for i4 and e3m2 that previously required SME — bringing sub-byte arithmetic to the much larger NEON install base. Streaming SVE picks up Giesen's trick for E4M3 → F16 and faster mini-float norms. SME GEMMs use fewer branches in the inner loop.

Also, NumKong now ships a CITATION.cff — hit "Cite this repository" on GitHub to grab it in case you are writing a paper on a related topic 🤗

Minor

Add: NEON & SDOT fallbacks for i4 & e3m2 (0c6afa5)

Patch

Docs: M5 perf stats for Wasmtime v43 (43c2881)
Fix: Alternative MSVC-friendly cast (4744b9b)
Make: Disable LTCG due to MSVC issues (3d37684)
Make: Try PREBUILDS_ONLY=0 in CI (64c5f95)
Improve: Lower NEONHALF → NEON requirements (37f99ec)
Fix: Wire nk_cast_neon benchmarks (3793af2)
Docs: Apple M5 native stats for secondary workloads (d7c81c4)
Improve: Faster in-vector 4-way finalizers in NEON (968dcd1)
Improve: Drop nk_f16x4_to_f32x4_neon (84bb20a)
Improve: vcvt_high for faster unpacking (a5f4a19)
Docs: Refresh GEMM/SYRK measurements Apple M4 → M5 (3e010de)
Fix: Harden strided reductions in NEON & AVX2 (61ac67b)
Fix: Double-counted tail in Skylake f64 RMSD, Kabsch, and Umeyama (5391344)
Improve: Share decimal.Context.traps rules (3c28ae9)
Fix: Padding partial tail 32-bit words for BMOPA (2598487)
Fix: Missing scale type definitions of mini-floats (91862da)
Fix: Scalar buffer cast internal overwrites & aliasing (7b0e129)
Fix: Top-bottom variable names (a014134)
Improve: Giesen's E4M3 → F16 in Streaming SVE (25322b5)
Improve: Fewer branches in SME GEMMs (858263c)
Fix: Up-round dimensions count in sub-byte C++ tests (87a72d0)
Make: Focus on M4 CPUs for SME probing (5ff63eb)
Improve: PyTesting across more shapes (4bc3e44)
Improve: Cleaner type-casting & promotion rules (23c2474)
Make: Hide formatting commits for v7-7.2 (f6ce2da)
Make: Native addon resolution for Deno & Bun (0d502d5)
Docs: Citations (6220137)
Improve: Faster mini-float norms in Streaming SVE (088de57)
Make: Integrate PyRight (0fe56c0)
Fix: F16 norms in SSVE skipped odd entries (bf3bfee)
Fix: Harden SVE MaxSim upcasting logic (803eb33)
Fix: Disable FPCR.AH bit (7b2b850)
Make: Node 24 for trusted publishing (9f1a4ef)
Fix: _m to zero-out predicated SVE/SME ops (16c157b)
Fix: _m to zero-out predicated SVE lanes in spatial/ (ac27cde)
Make: Replace stale prebuildify (74c5454)

Contributors

ClickHouse and albumentations-team

Assets 15

28 Mar 23:38

ashvardanian

v7.2.4

facd43f

Release v7.2.4

Release: v7.2.4 [skip ci]

Patch

Make: 2h timeout budget for JS & Py builds (2e8f081)

Assets 16

28 Mar 23:22

ashvardanian

v7.2.3

1f56f0d

Release v7.2.3

Release: v7.2.3 [skip ci]

Patch

Fix: Harden implicit narrowing casts (319fae2)
Fix: Negating unsigned integers in MSVC (9be61e3)
Make: Retry flaky CI jobs (b622d63)
Make: Remove conflicting NEON probes (c0f3573)

Assets 16

28 Mar 15:12

ashvardanian

v7.2.2

d5b2868

Release v7.2.2

Release: v7.2.2 [skip ci]

Patch

Make: Trusted publishing for NPM (9578271)
Improve: VNNI spatial kernels for E2M3, E3M2, & E4M3 (02d5325)
Fix: NK_TARGET_NEON auto-detect in MSVC (4ad2124)

Assets 15

Releases: ashvardanian/NumKong

v7.4.5: Faster RMSD

Uh oh!

v7.4.4: CI & MSVC Hardening

Uh oh!

Release v7.4.3

Patch

Uh oh!

Release v7.4.2

Patch

Uh oh!

Release v7.4.1

Patch

Uh oh!

v7.4: Fast Tensor Contractions

Minor

Patch

Uh oh!

v7.3: Hardened Arm Kernels, Upgraded CI, Citations, & Docs

Minor

Patch

Contributors

Uh oh!

Release v7.2.4

Patch

Uh oh!

Release v7.2.3

Patch

Uh oh!

Release v7.2.2

Patch

Uh oh!