Partial Optimization for IVFPQ by littleniuer · Pull Request #4893 · facebookresearch/faiss

littleniuer · 2026-03-10T09:24:53Z

Summary

This PR introduces ARM NEON optimizations for IVFPQ index, focusing on 8-bit lookup table operations and distance computations.

Changes

Optimized 8-bit LUT (lookup table) construction and distance calculation
Implemented algorithmic improvements combined with NEON intrinsics
All core optimization code is placed under faiss/sra_krl/ for clear organization

Design Decisions

Code is organized in a separate directory (faiss/sra_krl/) to keep the file structure clean and facilitate initial code review
Guarded by __aarch64__ macro, no impact on x86 builds
Functionally equivalent to the original implementation (no changes to index format or search results)

Testing

Passed FAISS built-in unit tests on AArch64 platform
Benchmark results show noticeable performance improvements on ARM servers

Notes

We understand that naming conventions and NEON code placement may need adjustments to align with the ongoing SIMD restructuring. We are happy to collaborate with maintainers to refine the code structure as needed.

If this contribution is well-received, we have additional optimizations for HNSW, Refine, and FastScan ready for follow-up PRs.

littleniuer · 2026-03-11T09:03:00Z

I noticed two test failures (test_precomputed_tables and test_hnsw_2level_mixed_search) on aarch64 with my changes. The root cause is a minor floating-point precision difference (~3.8e-06 max absolute error) introduced by the NEON-optimized PQ distance table computation. Specifically, the optimized kernel uses a transposed data layout and processes 16 centroids in parallel with a dimension-wise accumulation order, which differs from the original per-centroid accumulation. Since float32 addition is not associative, this leads to ~1 ULP differences in the distance values.

Both tests currently use bitwise-exact comparison (assert_array_equal / np.all(==)) for distance results. I've verified that relaxing the distance comparison to np.testing.assert_allclose(atol=1e-5, rtol=1e-6) resolves the issue. Note that only the distance (D) comparisons are relaxed — all index (I) comparisons remain strictly equal, confirming that the search results themselves are not affected.

All other 1087 tests pass without any modifications.

Partial Optimization for IVFPQ

8b35dfa

meta-cla bot added the CLA Signed label Mar 10, 2026

littleniuer mentioned this pull request Mar 10, 2026

[Performance] Improve FAISS performance on AArch64/ARM (HNSW, IVFPQ, IVFPQFS, PQFS, IVFFLAT) #4763

Open

bug fix

7d83232

littleniuer force-pushed the main branch from 427754e to 7d83232 Compare March 11, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Optimization for IVFPQ#4893

Partial Optimization for IVFPQ#4893
littleniuer wants to merge 2 commits intofacebookresearch:mainfrom
littleniuer:main

littleniuer commented Mar 10, 2026

Uh oh!

littleniuer commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

littleniuer commented Mar 10, 2026

Summary

Changes

Design Decisions

Testing

Notes

Uh oh!

littleniuer commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant