BM25 vs Embeddings vs Lua: Comparing Approaches to the MCP Too Many Tools Problem

Algis Dumbris • 2026/03/19

TL;DR

MCP tool definitions can consume 55,000+ tokens before your agent processes a single user message. With each tool costing 550-1,400 tokens, and real-world setups easily reaching 40-100+ tools across multiple servers, the context window fills up fast. Four fundamentally different approaches have emerged to solve this: MCPProxy’s BM25-based gateway, Speakeasy’s embedding-powered Dynamic Toolsets, my-cool-proxy’s Lua scripting layer, and Claude Code’s built-in Tool Search. Each makes different trade-offs across latency, accuracy, configuration burden, and portability. This post breaks down how they work, when they shine, and when they fall short.

The Problem: Your Tools Are Eating Your Context

The Model Context Protocol promised a universal interface between AI agents and tools. It delivered on that promise — and created a new problem in the process. Every MCP tool definition gets injected into the LLM’s context window as a JSON schema: the tool name, its description, every parameter with types, enums, and constraints. A single tool costs between 550 and 1,400 tokens depending on its complexity.

Connect three MCP servers — say GitHub, Slack, and a database tool — and you are looking at 40+ tools consuming upwards of 55,000 tokens before the agent even sees the user’s question. One documented case hit 143,000 of 200,000 available tokens just from tool definitions — 72% of the context window gone.

The problem compounds at the platform level. Cursor caps tool count at 40. GitHub Copilot stops at 128. Even platforms without hard limits see degraded LLM performance as the tool count climbs: the model struggles to select the right tool from a wall of JSON schemas, and accuracy drops off a cliff.

This is not a theoretical concern. It is the central scaling bottleneck of MCP today.

Four approaches to MCP tool discovery architecture

Approach 1: MCPProxy BM25 Gateway

MCPProxy takes the position that tool discovery should be automatic, zero-configuration, and invisible to both the client and the upstream servers. It sits between the MCP client (Claude, Cursor, Copilot, or any other) and any number of upstream MCP servers, acting as a transparent gateway.

How it works

When MCPProxy starts, it connects to all configured upstream servers and indexes every tool’s name, description, and parameter schemas into an in-process BM25 index (backed by Bleve). Instead of forwarding all tools to the client, it exposes a single meta-tool: search_tools. The client sends a natural-language query describing what it needs, MCPProxy runs a BM25 search, and returns only the top-matching tools — typically 5-10 instead of hundreds.

The key insight is that BM25’s term-frequency mechanics align well with how tool names and descriptions are written. A query like “create a GitHub pull request” contains the exact keywords that appear in github_create_pull_request’s definition. No embedding model, no vector database, no external service required.

Setup

go install github.com/smart-mcp-proxy/mcpproxy-go/cmd/mcpproxy@latest

# Add an upstream MCP server
mcpproxy upstream add --name github --url https://github-mcp-server.example.com

# Start the gateway
mcpproxy serve

That is it. No model downloads, no API keys for embedding services, no per-tool configuration. MCPProxy indexes tools automatically as upstream servers connect.

Trade-offs

Strengths: Sub-millisecond search latency. Zero configuration. Works with any MCP client. No external dependencies beyond the Go binary. The BM25 index rebuilds automatically when upstream tools change.

Weaknesses: BM25 is purely lexical — it cannot bridge the semantic gap between “notify the team” and slack_send_message. For small-to-medium tool sets (under 200-300 tools), this rarely matters because tool names are sufficiently descriptive. At larger scale, top-1 accuracy drops. Our earlier analysis showed BM25 hitting 14% top-1 accuracy at 270+ tools, though top-5 remains strong at 87%.

Approach 2: Speakeasy Dynamic Toolsets (Embeddings)

Speakeasy takes a different path: instead of keyword matching, it uses embedding-based semantic search to find relevant tools. Their Dynamic Toolsets system replaces the entire static tool list with three meta-tools: search_tools, describe_tools, and execute_tool.

How it works

When a toolset is configured, Speakeasy generates embeddings for every tool’s name, description, and categorical metadata. At query time, the user’s intent is embedded with the same model, and cosine similarity finds the closest matches. The system also supports tag-based categorical browsing (e.g., source:hubspot) to complement semantic search.

The three-function architecture implements progressive disclosure: search_tools returns tool names and brief descriptions, describe_tools fetches full schemas only for the tools the agent actually wants to use, and execute_tool runs them. This means a 400-tool deployment never sends more than a handful of full schemas to the LLM.

Results

Speakeasy reports input tokens reduced by an average of 96.7% for simple tasks and 91.2% for complex tasks, with overall token consumption dropping by up to 160x compared to static toolsets. They maintain 100% success rate across toolset sizes ranging from 40 to 400 tools.

Trade-offs

Strengths: Semantic understanding bridges vocabulary gaps that BM25 cannot. The 96%+ token reduction is dramatic. Scales gracefully to hundreds of tools without accuracy degradation. Tag-based filtering adds a structured discovery dimension.

Weaknesses: Requires an embedding model — either a hosted API call (adding latency) or a local model (adding deployment complexity). Speakeasy reports 2-3x more tool calls than static approaches due to the search-then-describe-then-execute flow, and roughly 50% slower execution time. The system is tightly coupled to Speakeasy’s platform rather than being a standalone tool.

Approach 3: my-cool-proxy (Lua Scripting)

my-cool-proxy takes the most flexible approach: rather than automating tool discovery, it gives users a full programming language to control exactly what happens. The proxy consolidates multiple MCP servers behind a single gateway and uses Lua as its scripting runtime for tool composition and filtering.

How it works

My-cool-proxy implements progressive disclosure through a different mechanism than search. Instead of exposing all tools upfront, it provides list-servers, list-server-tools, and tool-details meta-tools. The agent discovers available servers, browses their tool catalogs, and retrieves full schemas only for what it needs.

The Lua scripting layer takes this further. Users write Lua scripts that compose multi-step tool workflows into single execute() calls:

local raw_data = api_server.fetch({ id = 123 }):await()
local processed = processor.transform({ input = raw_data }):await()
result(processed)

This collapses what would be multiple agent-tool round trips — each consuming context window space for intermediate results — into a single scripted pipeline. The Lua runtime provides discovered servers as globals, with tools callable as async functions that support conditional logic and loops.

Trade-offs

Strengths: Maximum flexibility. Lua scripts can implement arbitrary filtering logic, multi-step workflows, conditional tool routing, and custom access control. The centralized configuration eliminates the need to duplicate MCP server configs across multiple clients. Tool access can be scoped per-agent for security.

Weaknesses: Someone has to write and maintain the Lua scripts. This is a power-user tool — it trades zero-configuration simplicity for programmable control. There is no automated discovery; the intelligence lives in the scripts, not in a search algorithm. The overhead scales with the number of custom workflows you define.

Approach 4: Claude Code Tool Search (Built-in)

Anthropic’s Claude Code includes a built-in Tool Search mechanism that handles tool overflow natively within the client. When the number of configured tools exceeds a threshold, Claude Code automatically hides some tools and exposes a search function that the agent uses to find what it needs.

How it works

Tool Search operates at the client level. Claude Code detects when the tool count is high enough to degrade performance, partitions tools into “always available” and “searchable” sets, and injects a search tool into the agent’s toolset. The search mechanism — likely combining keyword and semantic signals — runs against Claude’s infrastructure.

The agent uses natural language to describe what it needs, gets back matching tool definitions, and proceeds as normal. From the user’s perspective, it is invisible — the agent handles the search automatically.

Trade-offs

Strengths: Zero configuration, zero setup. Works out of the box for Claude Code users. Deeply integrated with the client, so the search-select-execute flow is optimized for Claude’s behavior.

Weaknesses: Client-specific. This solution only works in Claude Code — it does not help if you are using Cursor, Copilot, Windsurf, or any other MCP client. It is also opaque; you cannot tune the search behavior, set priorities, or control which tools are always available versus searchable. Stacklok’s benchmarks showed their hybrid approach outperforming Anthropic’s tool search at 94% versus lower accuracy, with faster response times (5.75s versus 12-13.5s).

Head-to-Head Comparison

Comparison of four tool discovery approaches

DimensionMCPProxy BM25Speakeasy Embeddingsmy-cool-proxy LuaClaude Tool Search
Search latencySub-millisecondTens of millisecondsN/A (manual discovery)Varies (remote service)
ConfigurationZero-configDashboard toggle + tagsLua scripts per workflowBuilt-in, no config
PortabilityAny MCP clientAny MCP clientAny MCP clientClaude Code only
Token reduction80-95% (top-K tools)96%+ (progressive disclosure)High (scripted pipelines)Unknown (opaque)
Semantic understandingNone (lexical only)Full (embedding model)None (manual routing)Likely yes
External dependenciesNone (in-process)Embedding model/APILua runtime (bundled)Anthropic infrastructure
Multi-step compositionNoNoYes (Lua scripts)No
Access controlQuarantine systemTag-based filteringScript-level scopingNone
Best at 50 toolsExcellentOverkillOverkillGood
Best at 500 toolsNeeds hybridExcellentDepends on scriptsGood

When to Use Which

MCPProxy BM25: The Desktop Developer Default

If you are a developer running Claude, Cursor, or Copilot with 5-15 MCP servers and you want the problem to disappear without thinking about it, MCPProxy is the right choice. Install a single binary, point it at your servers, and forget about it. The BM25 search handles the common case — keyword-rich queries against descriptively-named tools — with sub-millisecond latency and no external dependencies.

go install github.com/smart-mcp-proxy/mcpproxy-go/cmd/mcpproxy@latest
mcpproxy upstream add --name github --url https://github-mcp.example.com
mcpproxy upstream add --name slack --url https://slack-mcp.example.com
mcpproxy serve

MCPProxy also provides a quarantine system for security: new tools from upstream servers are quarantined by default and require explicit approval via mcpproxy upstream approve. For deployments approaching 300+ tools where BM25’s lexical limitations start to bite, MCPProxy is evolving toward hybrid BM25+semantic search.

Speakeasy Dynamic Toolsets: The API Platform Play

If you are building a product that exposes hundreds of tools through MCP — a CRM platform, a developer tool suite, an integration layer — Speakeasy’s embedding-based approach makes more sense. The 96% token reduction holds even at 400 tools, and semantic search handles the vocabulary diversity that comes with large API surfaces. The trade-off is platform coupling and the added latency of embedding lookups, but for API-first companies, that is worth it.

my-cool-proxy Lua: The Power User’s Toolkit

If you have complex multi-step workflows that you want to collapse into single operations, or if you need fine-grained access control over which agents can use which tools, my-cool-proxy’s Lua scripting gives you the control that automated approaches cannot. You are trading simplicity for power: every workflow needs a script, but those scripts can encode domain-specific logic that no search algorithm would infer. This is the right tool when your problem is not “finding the right tool” but “orchestrating tools in a specific sequence.”

Claude Tool Search: The Path of Least Resistance

If you exclusively use Claude Code and your tool count is moderate, the built-in Tool Search works without any additional setup. The limitation is portability — the moment you need the same tool set in Cursor or Copilot, you need a different solution. It also does not give you control over tool prioritization or access control.

The Bigger Picture

These four approaches are not really competing — they are solving different facets of the same problem from different positions in the stack.

MCPProxy and my-cool-proxy operate at the gateway layer, sitting between clients and servers. They are protocol-level solutions that work with any MCP client and any MCP server. The difference is automation versus control: MCPProxy automates discovery, my-cool-proxy automates execution.

Speakeasy operates at the platform layer, tightly integrated with how tools are defined and deployed. It is the right approach when you control the tool definitions and can optimize the entire pipeline from definition through discovery to execution.

Claude Tool Search operates at the client layer, solving the problem for one specific client. It is the most convenient when it is available and the most limiting when it is not.

The MCP ecosystem is still young. Today you might pick one of these approaches; in six months, you might be combining them. MCPProxy is already moving toward hybrid BM25+semantic search. Speakeasy is expanding beyond their platform. My-cool-proxy’s Lua layer could sit in front of any of the others. The tool discovery problem is not going away — if anything, it is accelerating as MCP server counts grow. The question is not which approach wins, but which combination gives you the best trade-off for your specific deployment.

Try It

The fastest way to see the difference is to try MCPProxy against your current setup:

go install github.com/smart-mcp-proxy/mcpproxy-go/cmd/mcpproxy@latest
mcpproxy serve

Add your existing MCP servers as upstreams, approve their tools, and watch the token consumption drop. If you hit the BM25 accuracy ceiling with a very large tool set, that is a signal to look at hybrid or embedding-based approaches — but for most developer workstations, BM25’s simplicity and speed are hard to beat.

Links: