Why BM25 Outperforms Vector Search for MCP Tool Discovery
Algis Dumbris • 2026/03/12
TL;DR
When your AI agent has access to hundreds of MCP tools, you need a fast way to find the right one. Vector search looks attractive on paper, but for tool discovery it introduces unnecessary infrastructure, latency, and cost. BM25 — the classic full-text search algorithm behind Elasticsearch and friends — delivers better results for this workload with zero external dependencies. MCPProxy ships with BM25-powered tool discovery today, and the numbers back up the choice.
The Problem: Context Window Explosion
If you have been building with MCP for any length of time, you have hit the wall. Connect 15-20 MCP servers, each exposing 5-30 tools, and suddenly your agent is staring at 40,000-100,000 tokens of tool definitions before the conversation even starts. That is not a theoretical concern — it is the everyday reality for teams building serious agentic workflows.
Google learned this the hard way. They removed MCP support from their Workspace CLI tooling after encountering exactly this problem: the sheer volume of tool schemas consumed so much of the context window that the model had little room left for actual reasoning. When a company with functionally unlimited engineering resources hits the same wall, it is a strong signal that “just load everything” is not a viable strategy.
The research confirms it. The RAG-MCP paper showed that when agents are given every tool upfront, tool selection accuracy drops to just 13.6%. The model gets overwhelmed by options and picks the wrong one. Retrieve only the relevant tools first, and accuracy jumps to 43.1% — more than tripling performance.
So tool retrieval is not optional. The question is: what retrieval mechanism should you use?
Why Vector Search Seems Like the Obvious Answer
If you have spent any time in the AI/ML space over the last few years, your instinct is probably “use embeddings.” And it is a reasonable instinct. Vector search excels at semantic similarity — finding things that mean the same thing even when the words differ. For document retrieval or semantic search over natural language, it is often the right call.
But tool discovery is not document retrieval. Here is what vector search actually requires for this use case:
- An embedding model: You need to run an embedding model (OpenAI, Cohere, a local model) to convert tool descriptions into vectors. That is either an API cost per tool or a local model consuming memory and GPU.
- A vector database: Storing and querying vectors requires infrastructure — Pinecone, Weaviate, Qdrant, pgvector, or similar. Each adds operational complexity.
- Cold start latency: On first launch, you need to embed every tool description. With 200+ tools, that is hundreds of API calls or a significant local compute step. In MCPProxy’s use case, where the tool set changes dynamically as servers come and go, you are re-embedding frequently.
- API dependency: If you are using a hosted embedding model, your tool discovery breaks when the API is down or rate-limited. Your agent cannot find tools because OpenAI is having an outage — that is a bad failure mode.
- Worse performance on structured queries: Here is the non-obvious part. Tool names and descriptions are short, keyword-dense, and often use technical jargon. Queries like “find the github_create_issue tool” or “list files in directory” are sparse, specific, and keyword-heavy. This is exactly the regime where BM25 outperforms dense retrieval. Embedding models tend to over-generalize short technical descriptions, matching semantically related but functionally wrong tools.
The infrastructure tax is real. For a tool that is supposed to be a lightweight local proxy, requiring a vector database and embedding model just to find the right tool is a non-starter.
Why BM25 Wins for Tool Discovery
BM25 (Best Match 25) is a probabilistic ranking function that has been the backbone of information retrieval for decades. It works by scoring documents based on term frequency, inverse document frequency, and document length normalization. It is the algorithm behind Lucene, Elasticsearch, and most search engines you have ever used.
For MCP tool discovery, BM25 has a compelling set of advantages:
Zero infrastructure. BM25 runs in-process. No external database, no API keys, no network calls. MCPProxy uses Bleve, a pure Go full-text search library, to maintain an index stored locally at ~/.mcpproxy/index.bleve/. The entire search engine is embedded in the binary.
Sub-10ms queries. Searching across hundreds of tools completes in single-digit milliseconds. There is no network round-trip, no embedding computation, no vector similarity calculation. The agent gets results essentially instantly.
Works offline. No internet connection required. No API to go down. No rate limits to hit. Your agent can discover tools on an airplane.
Handles tool names and descriptions well. Tool names like github_create_pull_request or filesystem_read_file are highly structured, keyword-rich strings. BM25 was designed for exactly this kind of query. When an agent searches for “create pull request github,” BM25 matches on the exact terms that matter. An embedding model might return tools that are “semantically similar” but functionally different.
No external dependencies. MCPProxy is a single binary. Adding a vector search requirement would mean either bundling an embedding model (adding hundreds of MB to the binary) or requiring users to configure an external service. Neither aligns with the goal of a simple, portable tool.
Deterministic and debuggable. BM25 scoring is transparent. You can inspect why a tool ranked where it did by looking at term frequencies. Vector similarity scores are opaque — good luck explaining why one tool scored 0.82 and another scored 0.79.
How MCPProxy Implements It
The implementation in MCPProxy is straightforward and battle-tested:
Indexing. Every tool from every connected MCP server is indexed in Bleve. The index includes the tool name, description, server name, and input schema. When a new server connects or a tool list changes, the index updates automatically.
Background refresh. MCPProxy refreshes the tool index every 5 minutes in the background. This catches tools that were added or removed on upstream servers. The refresh is lightweight — it uses hash-based change detection to only re-index tools that actually changed, avoiding unnecessary work.
Real-time updates. Beyond the periodic refresh, MCPProxy listens for MCP notification events from upstream servers. When a server sends a notifications/tools/list_changed notification, the proxy immediately re-indexes that server’s tools. This means newly added tools are discoverable within seconds, not minutes.
Query interface. The agent calls retrieve_tools with a natural language query. MCPProxy runs a BM25 search and returns the top results (default limit: 15). Each result includes the tool name, description, input schema, and which server it belongs to.
Result caching. Repeated queries for the same terms return cached results, avoiding even the sub-10ms search cost on hot paths.
Cold-start reduction. Instead of loading 50,000+ tokens of tool definitions at session start, the agent begins with just the proxy’s own tools (retrieve_tools, call_tool, upstream_servers, quarantine_security). That is a few hundred tokens. When the agent needs a specific capability, it searches and gets back 3-5 relevant tools — perhaps 1,000-2,000 tokens. The difference is dramatic: from saturating the context window to barely touching it.
The Numbers Tell the Story
The RAG-MCP paper provides the clearest evidence for retrieval-first tool routing:
| Approach | Tool Selection Accuracy | Prompt Token Usage |
|---|---|---|
| All tools loaded | 13.6% | 40K-100K+ tokens |
| Retrieval-first (top-K) | 43.1% | ~2K-5K tokens |
That is a 3x improvement in accuracy and a 10-20x reduction in token usage. The retrieval mechanism in the paper used TF-IDF-based approaches similar to BM25, not vector search — further validating the choice.
In practice, MCPProxy users report that tool discovery feels instant and accurate. The most common feedback is surprise at how well simple keyword search works for finding the right tool. This makes sense: developers name their tools descriptively. A tool called slack_send_message does not need semantic understanding to be found when someone searches for “send slack message.”
When Would You Want Vector Search?
To be fair, there are scenarios where vector search would add value:
- Cross-lingual tool discovery: If tool descriptions are in English but queries come in Japanese, embeddings handle this naturally. BM25 does not.
- Highly abstract queries: “Help me with my morning routine” is hard for BM25 to match against specific tools. An embedding model might connect it to calendar, email, and weather tools.
- Very large tool catalogs (10,000+): At extreme scale, the semantic generalization of embeddings might help surface tools that keyword search would miss.
MCPProxy’s architecture does not preclude adding vector search later as a complementary signal. But for the core use case — a developer or team with 100-500 tools across 15-30 MCP servers — BM25 is the right default. It is simpler, faster, cheaper, and more accurate for this workload.
Getting Started
BM25 tool discovery is enabled by default in MCPProxy. If you are already running MCPProxy, you are using it. Here is how to configure it:
Basic Setup
In your ~/.mcpproxy/mcp_config.json:
{
"search": {
"limit": 15,
"index_path": "~/.mcpproxy/index.bleve/"
},
"mcpServers": {
"github": {
"url": "https://api.githubcopilot.com/mcp/"
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
}
}
}
Using Tool Discovery
Once MCPProxy is running, your agent can discover tools with a simple call:
retrieve_tools(query="create github issue")
This returns the top matching tools with their full schemas, ready for the agent to invoke via call_tool.
Tuning the Results
search.limit: Controls how many tools are returned per query (default: 15). Lower values (5-10) work well for focused agents; higher values (20-30) for exploratory workflows.- The Bleve index is stored on disk and survives restarts. First launch indexes everything; subsequent launches load the existing index and only update what changed.
Install MCPProxy
Download the latest release from GitHub, or run:
brew install smart-mcp-proxy/tap/mcpproxy
Full documentation is available at mcpproxy.app.
Conclusion
The best infrastructure is the kind you do not have to think about. BM25 for tool discovery is boring technology that works. It requires no API keys, no vector database, no embedding model, no GPU, and no internet connection. It runs in single-digit milliseconds, handles the keyword-heavy nature of tool names and descriptions naturally, and scales comfortably to hundreds of tools.
When your AI agent needs to find the right tool among hundreds, the answer is not more AI — it is a 30-year-old ranking algorithm running in-process. Sometimes the old ways are the best ways.
This is not theoretical. It is shipping in MCPProxy today.
Originally published at mcpproxy.app/blog/.