fix(vector): Fix similarity-based HNSW search for cosine and dot product metrics #9559
+434
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a critical bug in HNSW vector search where cosine similarity and dot product metrics returned incorrect results. The search algorithm was treating all metrics as distance metrics (lower is better), causing similarity metrics (higher is better) to return the worst matches instead of the best.
Problem
The HNSW implementation had two issues with similarity-based metrics:
Search phase: The candidate heap in persistent_hnsw.go::searchPersistentLayer always used a min-heap, which pops the lowest value first. For similarity metrics where higher values are better, this caused the algorithm to explore the worst candidates first and terminate prematurely.
Edge pruning phase: The helper.go::addNeighbors function used a fixed comparison (
>) when pruning edges, which is correct for distance metrics but inverted for similarity metrics. This resulted in keeping the worst edges instead of the best.Root Cause
The original code assumed all metrics behave like distance metrics:
For Euclidean distance, lower values = better matches → min-heap is correct.
For Cosine/DotProduct similarity, higher values = better matches → need max-heap.
Solution
1. Added candidateHeap interface with metric-aware heap selection
2. Added isSimilarityMetric flag to SimilarityType
3. Fixed edge pruning comparison in addNeighbors
Files Changed
Testing
Added new tests covering:
Performance Note
This fix builds on PR #9514 which corrected the early termination condition. Together, these changes ensure HNSW search explores the correct number of candidates and returns properly ordered results.
Users experiencing slower insert/search times compared to v25.1.0 can tune performance by lowering efConstruction and efSearch parameters when creating your vector indexes.
Lower values trade recall for speed. The default values (efConstruction=128, efSearch=64) prioritize recall.
GenAI Notice
Parts of this implementation and all of the testing was generated using Claude Opus 4.5 (thinking).
Checklist
Conventional Commits syntax, leading
with
fix:,feat:,chore:,ci:, etc.Fixes #9558
Benchmarks
Our BEIR SciFact Information Retrieval Benchmarks now show recall rates close to or exceeding acceptable and excellent performance for all metrics.