Finding Information by Meaning: How AI Agents Search When Keywords Don't Work

There is a class of search query that breaks every keyword index ever built. Not dramatically, not with an error message. The index just returns something plausible that misses the point, and you never know.

The query looks like this: "essays criticizing modern AI benchmarks." Or: "engineering blogs that explain ML intuitively." Or: "case studies of companies that migrated from monoliths to microservices." None of these contain the words you'd actually find in the content they're pointing at. The essays don't say "I am criticizing AI benchmarks." The engineering blogs don't announce that they explain ML intuitively. The case studies don't open with "here is a story about migrating from a monolith."

This is the core problem for AI agents doing conceptual research. It is not a rare edge case. It is the normal condition for any search that starts from an idea rather than from specific words.

This guide covers one job: finding information by meaning on the open web. What breaks when you try to do it with keyword tools, how semantic search actually works, who built the only genuine semantic index of the open web, how to write queries that get results, and what fails quietly even when the tool is the right one.

This piece is part of Garden's complete guide to agentic AI search, which covers all eight jobs an agent does on the web and the infrastructure behind each.

What is agentic AI search?
What is the "open web," really?
Why traditional search breaks for agents
The eight jobs an agent does on the open web
The eleven tools, with their limits
MCP versus API: the access mode that changes everything
How to choose the right tool for each job
The shape of the market, and what it costs us
FAQ

The Job, Precisely Stated

When an agent receives a conceptual query — one that describes an idea, a type of content, a theme, or a pattern — it needs to find relevant pages on the open web even when the exact words of the query appear nowhere in those pages.

The functional goal is a list of URLs whose content is semantically close to the query. Not "where these words appear." Where this idea is discussed.

This sounds like the normal goal of any search. The difference is what happens when the query vocabulary and the document vocabulary diverge — which is always, for conceptual queries. A researcher looking for "papers that question whether scale alone drives capability improvements in LLMs" is not going to type those words into a search box and get the papers she wants from a keyword index. Those papers exist. They use phrases like "emergent capabilities," "scaling laws reconsidered," "beyond the bitter lesson." The query and the papers share almost no words. A keyword index returns nothing useful.

The agent running that same search needs infrastructure that can bridge the vocabulary gap. That infrastructure exists. It is not Google.

2. Why Keyword Indexes Break Here

Most search indexes, including Google, Bing, Brave, and every variant, are built on some form of BM25. The algorithm is 35 years old and still dominates because it is fast, predictable, and genuinely excellent for what it was designed to do.

BM25 scores documents based on how often your query terms appear in them, weighted by how rare those terms are across the entire index. A document where "benchmark" appears frequently and "criticism" appears occasionally gets a high score for "criticism of benchmarks." It does not matter what the document is actually about. What matters is term frequency and rarity. The algorithm is literally counting word matches.

This works beautifully for: "Anthropic Claude API pricing 2026." The page with current pricing contains those words. BM25 finds it.

This fails for: "thoughtful arguments that scale-based evaluation metrics miss what matters in AI." There are hundreds of pages with exactly those arguments. None of them use those exact words. BM25 scores all of them near zero.

The failure is silent. The index does not return an error. It returns the pages that scored highest on your query terms, which are usually tangentially related pages that happen to use some of your words in other contexts. The agent processes these confidently. Nobody knows the actual sources were missed.

One empirical comparison makes the gap concrete. Researchers testing semantic versus keyword retrieval found that semantic search outperforms BM25 by roughly 14 percentage points on conceptual queries (measured by NDCG@10). That is a large gap for an infrastructure choice that most people make by default. They reach for Tavily or Brave or Claude's built-in web search — all keyword-backed — because those tools are easy to set up. For exact queries, that is fine. For conceptual queries, you are leaving substantial retrieval quality on the table.

3.How Semantic Search Actually Works

The alternative is a vector index. It is a fundamentally different type of storage, not a layer on top of a keyword index.

Every page that gets indexed is run through an embedding model: a neural network trained to convert text into a list of numbers (a vector). The specific numbers encode meaning. The key property: pages that discuss the same ideas end up with vectors that are mathematically close to each other, even if they use completely different words. "Revenue growth" and "sales increase" produce vectors that are geometrically nearby. "Revenue growth" and "database migration" produce vectors that are far apart.

When a query comes in, it is converted to a vector using the same model. The index then finds the stored vectors nearest to the query vector. That is the search result.

The geometry is the retrieval mechanism. Proximity in vector space equals proximity in meaning. This is what lets a query like "case studies of technology decisions that backfired" find pages titled "The Architecture Mistake That Cost Us Six Months" — no shared words, correct result.

A useful analogy: keyword search asks "does this document contain these words?" Vector search asks "does this document mean something similar to this query?" They answer different questions. For conceptual research, only the second question is the right one.

The tradeoff is that vector indexes are computationally heavier to build and maintain, update less frequently than keyword indexes, and cannot do exact string matching reliably. If you search for an invoice number like INV-2024-00847, the vector model treats it as noise and returns pages about invoices in general — probabilistically close, but wrong. Keyword indexes find the exact string or return nothing. For identifiers and precise terms, keyword search is the correct tool.

4. The Infrastructure Gap: Who Built Semantic Search for the Open Web

Here is where the conversation gets specific.

Building a vector index of the open web is not the same as building a vector index of your company's documents. The open web has hundreds of billions of pages. Crawling them, embedding them, storing the embeddings, and serving queries against them at scale requires infrastructure that is genuinely difficult and expensive to build. Very few organizations have done it.

Exa is the only tool in the current agentic search ecosystem with genuine semantic search of the open web. Other tools either claim semantic search while using keyword indexes under the hood with ML overlays, or use Google's results with a semantic-looking interface on top. The distinction matters because shallow semantics fail in exactly the cases where you need real semantic search — the cases where vocabulary mismatch is large.

Exa's index is organized around embeddings. When you send a query, it is embedded and matched against the stored embeddings of web pages. The query type can be explicitly set to neural (semantic) or keyword (exact). That explicit toggle is itself informative: Exa is the only consumer-facing tool that makes you choose, because it is the only one where both modes are genuinely different and genuinely available.

The rest of the ecosystem for this job looks like this:

Tool	Semantic search type	What it actually does
Exa	Genuine neural vector index	Embeddings-based retrieval of open web
Tavily	Surface-level	Adds ML processing to Google results
Brave Search API	None	BM25 keyword index, independent of Google
Claude built-in web search	None	Runs on Brave, keyword only
Perplexity Sonar	Hybrid (proprietary)	BM25 + vector, closed system
Parallel	Overkill for single queries

The gap is significant. If you send a conceptual query through Claude's built-in web search, or through Tavily, or through Brave, you are getting keyword results dressed up with some ML post-processing. For queries where vocabulary matches, you will not notice the difference. For conceptual queries where vocabulary diverges, you are getting the wrong results, silently.

This is not a criticism of those tools. They are excellent for what they do. It is a description of what they do not do.

5. How to Write Queries for Semantic Search

Semantic search requires a different query grammar than keyword search. This is not obvious, and it is the most common mistake people make when switching to Exa.

The mental model for keyword search is: extract the important words from what you want and enter them. "Python machine learning tutorials 2025." The words are the query.

The mental model for semantic search is: describe the ideal page you are looking for. Write a sentence or two about what it would contain, what kind of writing it would be, what perspective it would take. "Practical tutorials aimed at working software engineers who want to apply ML without deep theoretical background, written in 2024 or 2025."

These feel similar but produce different results. The keyword query matches pages with those words. The semantic query matches pages with that character.

From Exa's own documentation: "Write natural language queries, not keywords. Exa is semantic/neural, not keyword-based." Their examples make the contrast explicit:

Bad: TSLA stock price (keyword style)

Good: Tesla current stock performance and recent price movement

Bad: python AND machine learning OR deep learning 2024

Good: recent tutorials on building ML models with Python

The Boolean operators (AND, OR) in the bad example degrade results because Exa ignores them syntactically and tries to process the whole string as a natural language phrase — which breaks. The keyword query strips out context the embedding model needs to determine meaning.

A few patterns that work well for conceptual queries through Exa:

Describe content type and character. "Engineering blog posts that explain distributed systems concepts using concrete analogies rather than formal definitions." This specifies what the pages are (blog posts), what they cover (distributed systems), and how (analogies, not formal definitions).

State the perspective or argument. "Papers arguing that current LLM evaluation benchmarks overestimate generalization." You are not searching for the words "overestimate" or "generalization" — you are describing the argument structure you want.

Include context about the audience or publication venue. "Technical writeups published on Substack by ML researchers, covering model interpretability." Venue and audience signal a specific register of writing that the embedding picks up.

Use the category parameter for content type filtering. Exa supports category:research paper, category:github, category:news, category:tweet. These help scope the search to specific content types without narrowing the semantic match. Note: use them only when the content type matters. Most queries should not use a category.

One thing that does not work: phrasing the query as a question you want answered. "What are the best frameworks for building AI agents?" is a question, not a description of content. Exa is not an answer engine. Rephrase: "comparison posts reviewing agent frameworks like LangChain, CrewAI, and LlamaIndex, written for developers who have built production systems."

6. What Fails Quietly

Even with the right tool, semantic search has specific failure modes that do not announce themselves.

Shallow semantics from non-semantic backends. Most tools that claim "semantic" capabilities are running ML post-processing on keyword results, not genuine vector retrieval. The failure is invisible: results look plausible, but the conceptually closest sources are missing. If you care about which kind of semantics a tool is using, you need to read its technical documentation, not its marketing.

Index update lag. Vector indexes update more slowly than keyword indexes. Exa's index is not real-time. Content published in the last few days may not be indexed yet. For queries about recent events or newly published work, vector search misses fresh content that a keyword index with livecrawl would catch. The solution: use Exa with livecrawl: "always" for fresh content, or pair with a keyword tool for recency. Note that livecrawl is only available through Exa's direct API, not through the MCP connector.

Query phrasing sensitivity. The quality of semantic search results depends significantly on how the query is phrased. This is a new skill that most users do not have, because keyword search does not require it. Two queries that describe the same ideal content can return substantially different results depending on word choice and framing. There is no universal fix for this. Build the skill, run variants, compare.

The vocabulary of rare domains. Semantic search works because embedding models learn from large corpora. Highly specialized domains with unusual vocabulary — very niche academic fields, technical subdisciplines, proprietary terminology — may not be well-represented in the embedding space. The model may not reliably cluster documents about those domains near each other. For specialized domains, hybrid approaches that combine vector with keyword are more robust.

Paywalled content. The open web boundary does not move because you switched to embeddings. A paywalled article is invisible to keyword search and to vector search. The difference: with keyword search, you get back results that do not include the paywalled article. With vector search, you also get back results that do not include it. The framing around semantic search sometimes implies that it finds the "real" sources — but it finds the real sources that are crawlable. The paywalled sources that might be more authoritative are structurally absent from any open web index, regardless of search type.

Confident results, wrong content. Unlike keyword search, which returns nothing or near-zero scores when the vocabulary diverges, vector search always returns something. The nearest vectors in the index are always present, even when the nearest vectors are not very close to the query. There is no minimum similarity threshold that a tool will surface to you by default. An agent processing the top results may not realize the semantic distance between what it found and what it was looking for is large. This is a case where building evaluation into the workflow — sampling results against known ground truth, checking whether sources actually address the query — is not optional.

7. When Not to Use Semantic Search

This is worth being direct about, because the current framing in the industry sometimes implies that semantic search is simply better than keyword search. It is not better. It is better for a specific job.

Use keyword search when:

The query contains exact identifiers: invoice numbers, product codes, version numbers, error codes, contract clause references. Semantic search treats these as noise. Keyword search either finds them or does not.
You need deterministic, reproducible results. Vector search is probabilistic. The same query can return different results as the index updates. Keyword search on the same index is stable.
The query is a known entity name, brand, or proper noun. "Anthropic Claude API" is not a conceptual query. It is a name. BM25 finds the official documentation directly.
Very short queries with high specificity. Two words that are highly specific ("HIPAA compliance") are better served by keyword search than by semantic search, which may over-generalize to related concepts you did not ask about.

Use semantic search when:

The query describes a concept, theme, approach, or argument. The idea can be expressed in many ways and you do not know which words the target content uses.
You are doing discovery — finding relevant pages before you know what to read.
The query is natural language and you do not know the document's exact wording.
You are searching for content that uses synonyms, paraphrases, or different framings of the same underlying idea.

The production-ready pattern, for agents that encounter both types of queries, is routing: recognize which problem you have before choosing which tool to use. A contract review agent needs to find "indemnification clause" by meaning AND retrieve "EXHIBIT-A" by exact match. These are different jobs requiring different tools. Running both in parallel and merging results (hybrid search) gives the best of both, at the cost of more infrastructure. Whether that cost is worth it depends on the workload.

8. The Practical Decision Guide

You have a conceptual query and want the best available tool. Use Exa with type: "neural" via the API, or web_search_exa via the MCP connector. Write the query as a description of the ideal page, not as keywords.

You need semantic search but also need fresh content (last few days). Use Exa API with livecrawl: "always". This bypasses the index and fetches pages directly, combining real-time access with Exa's semantic processing. Not available through MCP.

You have a mix of conceptual and exact queries in the same workflow. Route by query type. Conceptual queries to Exa neural. Exact queries to Brave, Tavily, or Claude's built-in search. If you cannot route, run hybrid and merge with Reciprocal Rank Fusion.

You want deep research on a complex topic that requires many searches. Exa is not the right tool for this. Parallel's deep research mode handles multi-step orchestration. Use Exa for the individual conceptual lookups within a workflow, not as the orchestrator of a 100-source synthesis.

You are building a production system and need to control costs. MCP is simpler but exposes a narrower feature set. Exa's MCP connector has no domain filters, no date filters, no livecrawl, no explicit neural/keyword toggle. For workflows that need those controls, the API is required. The convenience of MCP costs you auditability and control — sometimes the right trade, sometimes not.

9. A Note on What This Infrastructure Does Not Solve

Semantic search solves the vocabulary gap between queries and documents. It does not solve the access gap.

The most authoritative sources on most topics are behind paywalls or in proprietary databases. Research papers on academic platforms, clinical trial data, regulatory filings, financial databases, primary legal documents. Vector indexes of the open web cannot reach them. The best semantic match on the public internet may still be a secondary or tertiary source commenting on work that lives behind a paywall.

This is not a criticism specific to semantic search. It is the structural condition of the open web. The Garden guide to deep research for agents covers the tools (Valyu, primarily) that are beginning to bridge this gap by giving agents unified access to both open web and proprietary databases in a single API call. That is a different job from finding by meaning on the open web, but it is worth knowing the boundary exists before you rely on open web semantic search for research where the primary sources are closed.

Semantic search on the open web is genuinely powerful for the job it does. The job has real limits. Know them before you decide how much weight to put on what it finds.

This guide is part of Garden Research's investigation into how AI agents navigate the open web. The complete agentic AI search guide covers all eight jobs and eleven tools in the current infrastructure layer.