Add localbm25 offline BM25 provider (no provider API keys required)
#13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new provider:
localbm25, an offline BM25 baseline memory provider for MemoryBench.Motivation
MemoryBench currently requires a hosted memory provider (Supermemory/Mem0/Zep) to run ingestion + retrieval. This introduces friction for contributors who don’t have provider credits.
localbm25provides a fully offline baseline provider (in-memory BM25) so anyone can run MemoryBench without provider API keys and establish a reproducible baseline.What’s included
src/providers/localbm25/ProviderName, provider registry,getProviderConfig)localbm25wink-bm25-text-search,wink-nlp-utils)Usage
Run MemoryBench with the offline baseline provider:
Compare it against hosted providers:
✅ Note:
localbm25does not require any provider API keys (only judge/answer model keys).Implementation notes
awaitIndexing) to match MemoryBench’s ingest → indexing → search pipeline.--limit 1/--limit 2), BM25 consolidation/search may not be supported by wink; provider falls back to deterministic token-overlap retrieval to keep the pipeline functional.Validation
convomemrun completed end-to-end withlocalbm25and generateddata/runs/<run-id>/report.json.comparesuccessfully initializes and runs ingestion/indexing/search forlocalbm25(answer/eval model rate limits depend on configured judge/answer model and org limits).Why this is useful