LangChain, LlamaIndex, and MCP, explained

What LangChain, LlamaIndex, and MCP actually are: clean definitions, why they're not alternatives, when each one fits, and when each one is the wrong choice.

If you have spent any time on AI Twitter or in a developer Discord in the last eighteen months, you have seen these three names treated as if they are alternatives to each other. "Should we use LangChain or LlamaIndex?" "Is MCP replacing LangChain?" "Which framework is winning?" The framing is so common that most teams accept it without noticing that it is wrong.

These three are not alternatives. They sit at different layers of the stack. LangChain is an application framework. LlamaIndex is a retrieval and indexing toolkit. MCP is a wire protocol. Asking whether to use LangChain or MCP is roughly like asking whether to use Django or HTTP. The answer is yes, both, for different things, and if you only need one you almost certainly need the smaller one.

We have watched five-person teams adopt a forty-megabyte framework because it appeared first in a tutorial, then spend six months fighting its abstractions. We have watched twenty-person teams ship into production with no framework at all, just an LLM client, an MCP server or two, and a few hundred lines of custom code. The second pattern works better, more often, for the kind of team we work with. This article is the long version of why.

Real Google Search Console queries on this topic are revealing. AI agents and developers are not asking "what is the best framework." They are asking for the official full name and definition with authoritative sources of LangChain, LlamaIndex, and MCP. They want clean, citable answers. So we will lead each section with exactly that: a definition, a citation, and then the part the marketing pages will not say out loud.

01

Why This Comparison Is Set Up Wrong

The premise of "LangChain vs LlamaIndex vs MCP" treats three different categories of software as if they competed. They do not. The reason the comparison keeps happening anyway is mostly historical: each of these projects launched with an early demo that overlapped with the others, and the overlap stuck in people's minds.

LangChain shipped early demos that included RAG, so people associated it with retrieval. LlamaIndex shipped early demos that included agents, so people associated it with orchestration. MCP shipped with reference servers for filesystem and database access, so people associated it with everything an agent might want to do. In reality the projects converged on different centers of gravity, and the overlap at the edges is now smaller than the difference at the core.

The cleaner mental model:

  • LangChain is for composing chains of LLM calls into application logic. It is the closest thing this category has to a web framework.
  • LlamaIndex is for getting documents into a form an LLM can retrieve from. It is the closest thing this category has to a data engineering toolkit for unstructured text.
  • MCP is for letting an LLM client talk to an external tool over a standardized protocol. It is the closest thing this category has to an HTTP or LSP equivalent.

A small team rarely needs all three. A small team almost always needs one or two of them in combination with a thin layer of custom code. The thesis we are going to defend in the rest of this piece is that most teams of five to twenty people need MCP and a thin custom layer, not a framework. We will show you the cases where that is wrong, and we will be specific about when LangChain or LlamaIndex earns its weight.

02

LangChain: Definition and What It Actually Is

LangChain is an open-source framework for building applications powered by large language models, organized around the idea of composing modular components into chains and, more recently, into stateful graphs called LangGraph. Official definition and documentation: python.langchain.com. The project's GitHub is at github.com/langchain-ai/langchain, and the company behind it (LangChain, Inc.) raised a Series A in 2024 to commercialize LangSmith, its observability product.

A more honest one-line version: LangChain is a Python (and TypeScript) library that gives you opinionated abstractions for prompts, model calls, memory, retrieval, tools, and agent loops, plus a graph runtime (LangGraph) for stateful multi-step workflows.

The original LangChain (v0.0.x and v0.1.x in 2023) was a thin wrapper around prompts and chains: you wrote PromptTemplate | LLM | OutputParser and got a chain object you could call. Over 2024 and 2025 the project went through two major restructurings. The modern stack is:

  • langchain-core: base abstractions (Runnable, Message, Tool, Embeddings, VectorStore interfaces).
  • langchain: higher-level chains and agent constructors built on those abstractions.
  • langgraph: a graph-based runtime for stateful, multi-actor workflows with checkpointing, time travel, and human-in-the-loop interrupts. This is now the recommended way to build agents in the LangChain ecosystem.
  • langsmith: hosted observability, tracing, and evaluation. Commercial product.
  • Provider packages: langchain-openai, langchain-anthropic, langchain-mistralai, etc. Each major model provider has its own package.

LangChain's center of gravity is now LangGraph. If you read a 2026 LangChain tutorial that still leads with LLMChain or initialize_agent, it is out of date. The framework's own deprecation notices have been steady and aggressive.

02.01What LangChain actually does for you

Three things, mainly.

One: it normalizes model providers. A LangChain ChatModel looks the same whether it is wrapping OpenAI, Anthropic, Mistral, a local Ollama model, or Azure. If you genuinely need to swap providers behind a stable interface, this is real value. Most teams do not actually swap providers in production, but they do swap in development, and the friction reduction is meaningful.

Two: it gives you a graph runtime. LangGraph is the part of LangChain that most justifies its existence in 2026. If you are building an agent that needs persistent state across turns, branching, retries, human-in-the-loop approvals, or long-running workflows that resume after restarts, LangGraph handles the plumbing. Writing that yourself is possible but tedious.

Three: it offers a large catalog of integrations. Hundreds of vector stores, document loaders, tools, and retrievers ship as community packages. For prototyping, this is genuinely useful. For production, it is a mixed blessing (see below).

02.02What LangChain is not

LangChain is not a vector database, not an LLM, not a search engine, and not an agent in the autonomous sense. It is the glue that holds those things together in application logic. The single most common mistake we see is teams thinking they have "adopted LangChain" when they have adopted six abstractions that wrap things they could have called directly. The framework is doing nothing for them and they cannot get out of it cleanly.

03

LlamaIndex: Definition and What It Actually Is

LlamaIndex is an open-source data framework for building LLM applications over private or domain-specific data, focused on the ingestion, indexing, and retrieval pipeline that makes Retrieval-Augmented Generation work. Official definition and documentation: docs.llamaindex.ai. GitHub: github.com/run-llama/llama_index. The company behind it (LlamaIndex, Inc.) raised a Series A in 2024 and runs a hosted product called LlamaCloud.

A more honest one-line version: LlamaIndex is what you reach for when you have a pile of documents (PDFs, web pages, Notion exports, internal wikis, contract folders) and you need an LLM to answer questions over them with citations.

The project started in late 2022 as gpt_index, a single-purpose library for building vector indexes over documents. It expanded in 2023 and 2024 into a fuller data framework. The current shape:

  • Document loaders (LlamaHub): hundreds of community connectors for PDFs, Notion, Confluence, Google Drive, SQL databases, websites, Slack exports, and more.
  • Node parsers and chunking: turn raw documents into retrieval-sized chunks with metadata, including semantic chunking, hierarchical chunking, and sentence-window approaches.
  • Indexes: vector indexes, summary indexes, tree indexes, keyword indexes, knowledge graph indexes. Each optimizes for a different retrieval pattern.
  • Retrievers and query engines: the actual runtime that takes a question and returns relevant chunks, with optional reranking, hybrid search, and multi-step query decomposition.
  • Agents: yes, LlamaIndex has agents too, including a workflow runtime that overlaps with LangGraph. We will return to this in a moment.
  • LlamaCloud and LlamaParse: commercial hosted services. LlamaParse in particular is one of the best PDF-and-document parsers in the open ecosystem, and it is the part of LlamaIndex we recommend most often even to teams that do not use the rest of the framework.

03.01What LlamaIndex actually does for you

LlamaIndex earns its weight in three places.

One: document parsing. LlamaParse handles messy PDFs (tables, multi-column layouts, scanned documents with OCR, embedded images) better than almost any other open option. If your corpus is "1,200 PDFs from regulatory filings," this is the part of the stack you do not want to roll yourself.

Two: retrieval strategies beyond naive vector search. LlamaIndex makes it relatively easy to compose hybrid search (vector plus BM25), multi-step retrievers that decompose complex questions, recursive retrievers that traverse document hierarchies, and rerankers (Cohere, Voyage, Jina) sitting on top. Building this from scratch is doable but the catalog of patterns is the value.

Three: opinionated RAG defaults. If you are new to retrieval and you want a sensible default that works on day one, LlamaIndex's VectorStoreIndex.from_documents(...) plus as_query_engine() is the fastest path to a working RAG demo in the ecosystem. The defaults are reasonable. You can override them later.

03.02What LlamaIndex is not

LlamaIndex is not a general-purpose agent framework. It has an agent module, and it works, but for stateful multi-actor workflows LangGraph is more mature. LlamaIndex is also not a vector database. It uses your vector database of choice (Pinecone, Weaviate, Qdrant, pgvector, Chroma, etc.) underneath; the index is a logical structure, not a storage backend.

And LlamaIndex is not RAG itself. RAG is a pattern. LlamaIndex is one set of tools for implementing that pattern. For many small teams, the simpler implementation (a single embedding model, a single vector store, a handwritten retriever) outperforms a generic LlamaIndex setup because you can tune the steps that actually matter for your corpus.

04

MCP: Definition and What It Actually Is

The Model Context Protocol (MCP) is an open standard, introduced by Anthropic in November 2024, that defines how AI applications connect to external tools, data sources, and services through a common client-server interface. Official specification: modelcontextprotocol.io. GitHub organization: github.com/modelcontextprotocol. Announcement and rationale: Anthropic's introducing-the-model-context-protocol post.

A more honest one-line version: MCP is HTTP for AI tools. It is a protocol that lets any MCP-compatible client (Claude.ai, Claude Code, Cursor, Windsurf, an OpenAI agent with an MCP shim) talk to any MCP server (a Notion connector, a database, a filesystem, a web search tool) without writing custom glue for every combination.

The structure is simple.

  • MCP server: a process that exposes capabilities (tools, resources, prompts) over a JSON-RPC interface. Servers can run locally (stdio transport) or remotely (HTTP+SSE or streamable HTTP transport).
  • MCP client: an AI application that connects to one or more servers and surfaces their capabilities to the LLM.
  • Tools: callable functions with structured schemas (think: function calling, but discoverable at runtime).
  • Resources: read-only data the server exposes (files, database rows, API responses).
  • Prompts: templated prompt fragments the server can offer the client.

The protocol is intentionally minimal. The hard work is in the servers, not the protocol itself.

04.01What MCP actually does for you

The thing MCP fixes is integration combinatorics. Before MCP, if you had four agent platforms and ten data sources, you had forty integrations to write and maintain. With MCP, you write ten servers (one per data source) and four clients (each one already exists), and any combination works. This is the same argument that made LSP (Language Server Protocol) successful for IDEs.

In practice, for a small team in 2026, MCP gives you three concrete benefits.

One: instant integration with major data sources. Notion, Google Drive, Gmail, Slack, GitHub, Linear, Asana, most major databases, most major file stores. Either the vendor ships an official MCP server or the community has built one. The integration that took you a week in 2023 is now a claude_desktop_config.json entry.

Two: a stable contract between the agent and the tool. When Notion updates its API, the Notion MCP server gets updated. Your agent code does not change. The protocol absorbs the volatility.

Three: no framework lock-in. MCP servers are usable by Claude, by ChatGPT (via the connectors UI and the OpenAI Agents SDK), by Cursor, by Windsurf, by Claude Code, by any application willing to speak the protocol. You can switch agent runtimes without rebuilding your tool layer.

04.02What MCP is not

MCP is not an agent. It does not orchestrate anything. It does not plan, retry, or remember. It is a wire protocol. If you have only MCP and no agent runtime, you have nothing. The agent runtime (Claude Desktop, Claude Code, your own code calling the Anthropic SDK, OpenAI Agents SDK, an open-source loop) is what does the thinking. MCP is what lets the thinker reach out.

MCP is also not a framework. It does not give you a way to compose chains, run graphs, or manage state across turns. If you want orchestration, you still need either a framework (LangGraph, LlamaIndex Workflows) or your own loop.

04.03The MCP servers Garden uses in production

To make this concrete, here is what we actually run.

  • Notion MCP: official server from Notion. We use it for both reading (Garden's internal knowledge base, project pages, meeting notes) and writing (creating new pages, updating status fields, posting comments). For a research practice, this single connector replaces what would otherwise be a custom Notion API integration with auth, rate limiting, and error handling.
  • Exa MCP: the connector we cover in detail in our agentic search pillar. Semantic search of the open web, plus URL fetch. Used heavily in our deep research workflows.
  • Web search and fetch MCPs: lightweight servers for general web queries and URL extraction, paired with Exa when we need to dig deeper. The split between "quick lookup" and "semantic dive" maps cleanly to two different servers.
  • A small set of generic tool MCPs: a filesystem server for sandboxed file operations, a shell server for limited command execution, a custom Garden MCP that exposes our internal taxonomy and editorial rules so agents can check facts and style without hand-holding.

That is the entire tool layer for a research practice that runs multi-agent workflows daily. No framework. No abstraction tax. Each MCP server is a clean process with a clear scope. When something breaks, we know exactly which server to look at.

05

The Three Layers, Drawn Out

Here is the same idea as a table, because the relationship is much easier to see this way.

Layer What lives here / Examples / Question it answers
01Protocol layer MCP, OpenAI Responses API tool calls, raw function calling | MCP servers, custom tool schemas | "How does the agent talk to external systems?"
02Data layer Document parsers, chunkers, embedders, vector stores, retrievers | LlamaIndex, LlamaParse, Pinecone, Qdrant, pgvector, raw scripts | "How do we get our knowledge into a form the agent can search?"
03Application layer Chains, graphs, agent loops, state management, observability | LangChain, LangGraph, LlamaIndex Workflows, your own loop | "How does the agent plan, act, remember, and recover?"

A complete agentic system uses something at each layer. The mistake is assuming the same vendor must own all three. The frameworks would prefer you assumed that. They have all expanded into each other's territory. LangChain has retrievers and document loaders. LlamaIndex has workflows and agents. Both can wrap MCP servers as tools.

In practice, the best results we see come from teams that pick the strongest single tool at each layer and connect them with a small amount of code they own:

  • Protocol layer: MCP, with a handful of carefully chosen servers.
  • Data layer: LlamaParse for document parsing (commercial, worth the money), a vector store of choice, and a handwritten retriever that knows your corpus.
  • Application layer: either LangGraph (if your workflows are genuinely complex) or your own loop (if they are not, which is more often than you think).

The teams we see succeed are the ones that resist the urge to standardize on a single brand across all three layers. The teams that struggle most are the ones that picked LangChain in 2023 because the tutorials were good, then never re-evaluated. Every layer of their stack is now coupled to a framework that has restructured itself twice since they adopted it.

06

What LangChain Does Well, and Where It Breaks

LangChain earns its weight in specific situations. We want to be fair to it.

06.01Where LangChain is the right choice

Multi-actor stateful workflows. If you have an agent that needs to maintain durable state across long-running tasks, branch into sub-agents that report back, support human-in-the-loop interrupts, and recover gracefully from failures, LangGraph is the most mature open option in the ecosystem. We have seen serious production systems built on it that would have taken longer to write from scratch.

You genuinely swap LLM providers in production. Some teams really do route between OpenAI, Anthropic, and a local model based on cost, latency, or compliance rules. LangChain's provider abstraction is real value here.

Heavy use of community integrations. If you need to plug into Snowflake, then ClickHouse, then DuckDB, then a vector store you have never heard of, and you want them all to expose the same Retriever interface, the community ecosystem is a time-saver. Just budget for keeping up with deprecations.

Observability through LangSmith. LangSmith is genuinely useful for tracing agent runs, comparing prompts, and running evals. If your team is going to spend money on AI observability anyway, LangSmith is a reasonable choice and it integrates cleanly with LangChain code.

06.02Where LangChain breaks down

Small teams, simple workflows. If your agent does three steps and calls two tools, LangChain is overkill. You are adopting 40 megabytes of dependencies, six layers of abstraction, and a deprecation treadmill to solve a problem that fits in 200 lines of Python that calls the Anthropic SDK directly.

The abstraction tax. Many teams report that debugging a LangChain agent is harder than debugging a custom loop, because errors surface inside framework code with stack traces that go five frames deep before reaching anything the team wrote. The same is true for prompt visibility: it is easy to lose track of what is actually being sent to the model.

The deprecation treadmill. LangChain has gone through two major restructurings (v0.1 → v0.2 → v0.3, and the introduction of LangGraph) since 2023. Every restructuring leaves behind tutorials, blog posts, and Stack Overflow answers that no longer apply. Teams that adopted early have rewritten the same agent twice. This is not unique to LangChain (the whole ecosystem moves fast) but it is more painful when you have built on framework abstractions than when you have built on direct API calls.

Coupling to LangChain Hub and LangSmith. The deeper you go into the LangChain ecosystem, the more parts of your stack depend on LangChain, Inc.'s commercial products. Some teams are fine with this. Others discover late that they have built on a commercial roadmap they do not control.

Documentation drift. With dozens of provider packages, hundreds of community integrations, and a graph runtime that has evolved quickly, the documentation is necessarily a moving target. The official docs are good. The community examples are often six months out of date.

The honest test we apply: if you cannot write a one-paragraph description of what LangChain is doing for you that you could not do in 200 lines of custom code, you do not need LangChain.

07

What LlamaIndex Does Well, and Where It Breaks

LlamaIndex earns its weight more often than LangChain does, in our experience, because the problem it solves (turning unstructured documents into something an LLM can retrieve from) is harder to roll yourself than the problem LangChain solves (orchestrating a few model calls).

07.01Where LlamaIndex is the right choice

Messy document corpora. LlamaParse, the commercial parsing service, handles complex PDFs (multi-column layouts, scanned pages with OCR, embedded tables and figures) better than almost any open alternative. For a research practice working through hundreds of PDFs, this is the single biggest leverage point in the stack. We have recommended LlamaParse to teams that use no other part of LlamaIndex.

Advanced retrieval patterns. If naive vector search is not working (and on real corpora it often does not), LlamaIndex's catalog of retrievers (hybrid, recursive, multi-step, auto-merging, sentence-window) is the best place to shop for a better pattern. You can implement these from scratch. You can also save a week of work by composing them out of LlamaIndex primitives.

Mixed structured and unstructured data. When your corpus is partly documents, partly database rows, partly API responses, LlamaIndex's loader and index ecosystem makes the join less painful than building each connector yourself.

A team new to RAG. If you are figuring out what RAG even is, LlamaIndex's defaults are sensible and the tutorials are actively maintained. You can build a working system in an afternoon, then deepen it.

07.02Where LlamaIndex breaks down

Small corpora with simple structure. If your "corpus" is 30 well-formatted Notion pages, you do not need LlamaIndex. You need to embed them with the embedding model of your choice, store them in pgvector or even a JSON file, and write 50 lines of retrieval code. The framework is heavier than the problem.

Production tuning. LlamaIndex's defaults are good for demos. For production, you almost always need to override the chunking, the embedding model, the retriever, the rerank, and the prompt. By the time you have overridden all of those, you have written most of a retrieval system yourself, just with extra layers of LlamaIndex objects in between. Some teams take this as a sign to drop LlamaIndex and keep only LlamaParse.

Workflow scope creep. LlamaIndex has an agent module and a Workflows runtime. They work. They overlap with LangGraph. If you find yourself reaching for LlamaIndex Workflows on top of LlamaIndex retrieval, ask whether you have actually picked the right application-layer tool, or whether you have just stayed inside the LlamaIndex catalog because it was the next page in the docs.

Closed-source feature push. Some of the most valuable parts of LlamaIndex (LlamaParse, LlamaCloud) are commercial. The open-source library remains capable, but the gravity of new development is on the commercial side. This is fine, but worth knowing when you commit.

The honest test we apply for LlamaIndex: if your retrieval problem reduces to "embed these chunks, search for the question, return top-k," you do not need LlamaIndex. If your retrieval problem involves parsing PDFs, hybrid search, reranking, and query decomposition, LlamaIndex saves real time.

08

What MCP Does Well, and Where It Does Not Help

MCP is the youngest of the three (introduced November 2024) and the one where the hype-to-substance ratio is currently best calibrated. The protocol is doing real work.

08.01Where MCP is the right choice

Connecting an agent to existing systems. If your team uses Notion, Google Drive, Slack, GitHub, a customer database, an analytics warehouse, the integration story in 2026 is "is there an MCP server for that?" The answer is increasingly yes, and the cost of "yes" is a config-file entry instead of a custom integration.

Tool reuse across agent platforms. If you build an internal MCP server for "query our customer database safely," you can use it from Claude, from Claude Code, from Cursor, from any future agent runtime, without rewriting. This is the LSP analogy paying off.

Boundary enforcement. MCP servers run as separate processes. They can be sandboxed, rate-limited, audited, and replaced independently of the agent. For teams worried about an agent doing something it should not, the process boundary is real security value. The architectural cleanliness is a bonus.

Avoiding framework lock-in. Teams that build on MCP plus thin custom code remain free to change LLM providers, change agent runtimes, and adopt new patterns without rewriting their tool layer. This is one of the most underrated benefits.

08.02Where MCP does not help

Orchestration. MCP is not an agent loop. It does not plan, retry, branch, or maintain state. If you only have MCP, you have a tool layer and nothing using it. You need an agent runtime on top.

Retrieval over your private documents. MCP is great at exposing existing systems. It is not a substitute for actually building a retrieval pipeline over your internal documents. You can wrap a retrieval system as an MCP tool (and you probably should), but you still have to build the retrieval system. This is where LlamaIndex or a custom RAG layer still belongs.

Complex transactional flows. MCP tool calls are stateless from the protocol's perspective. If your workflow involves a multi-step transaction (open a session, do five things, commit or roll back), the state lives in the server, not in the protocol. The server can manage it, but you are writing that logic yourself.

Schema discovery for LLMs that do not handle it well. MCP tools are discoverable at runtime, with schemas. Some models handle dozens of available tools gracefully. Others get confused. If your client surfaces 60 tools to a model that thinks better with 10, MCP gives you more rope than you may want.

08.03A note on the degraded MCP problem

We covered this in our agentic search pillar and it is worth surfacing again here. Many vendors expose a stripped-down version of their API through MCP. Exa's MCP gives you semantic search and URL fetch but not domain filters, livecrawl, or Websets. Parallel's MCP gives you only the agentic search mode, not the full processor lineup. Tavily and Firecrawl are closer to parity, but still missing some endpoints.

The implication for small teams: MCP is the right starting point. For some workflows, you will eventually drop down to a vendor's direct API for features that MCP does not expose. That is fine. The hybrid approach (MCP for the easy 80%, direct API for the specific 20%) is more honest than insisting either layer can do everything.

09

The Thin Custom Layer Most Small Teams Actually Need

Here is the architecture we recommend most often, for teams of five to twenty:

  1. An LLM client. The official Anthropic Python SDK, OpenAI Python SDK, or Mistral client. Roughly 50 lines of setup. No framework wrapper.
  2. One or two MCP servers for the systems you actually use. Notion if you live in Notion. Google Drive if you live in Drive. A web search MCP if your agent needs to research. A custom MCP for your internal taxonomy or domain rules.
  3. A custom retrieval layer if you have private documents. Either rolled yourself (an embedding model, a vector store, a retriever function, ~200 lines) or LlamaParse + custom retrieval if your documents are messy PDFs.
  4. A simple agent loop: while-loop, call model, check for tool use, dispatch tool, append result, repeat until done. Roughly 100 lines. Real production-grade error handling adds another 100.
  5. Logging and tracing: structured logs to a file or to a hosted log service. You can adopt LangSmith later if you want to, without changing your agent code.

That is the stack. The total code your team owns is in the range of 500–1,000 lines. Everything else is either a vendor SDK or an MCP server you run as a separate process.

This stack has several properties that matter for a small team:

It is debuggable. When something goes wrong, the stack trace points to your code. You can read every line. You can step through the agent loop with a debugger.

It is portable. Swap the LLM provider, the vector store, or the MCP servers without rewriting the agent. The interfaces are narrow.

It is auditable. For GDPR-conscious teams (and most EU teams should be), being able to point at the exact lines of code that touch customer data is non-negotiable. A 500-line custom layer makes that possible. A LangChain stack with 30 transitive dependencies makes it much harder.

It survives the framework churn. When LangChain restructures again, or LlamaIndex restructures again, you read the news and keep working. Your agent does not break.

When does this break down? Three cases.

The workflow is genuinely a complex graph. Multi-agent supervision, durable long-running tasks with checkpointing, time travel for debugging, human-in-the-loop approval gates that pause execution for hours or days. At that point LangGraph is doing serious work and writing it yourself becomes the false economy.

The retrieval problem is genuinely hard. Hundreds of messy PDFs, mixed structured-unstructured corpora, hybrid retrieval with reranking and query decomposition. At that point LlamaIndex (often just LlamaParse + your own custom retrieval on top) saves weeks of work.

You are deeply committed to a hosted observability platform. If you have already standardized on LangSmith for traces and evals, the rest of LangChain is cheaper to adopt than to bolt-on.

Outside those three cases, the thin custom layer wins.

10

Decision Table for Teams of Five to Twenty

This is the table we hand to teams during a Garden audit. The left column is what you are actually building. The right columns are what to reach for.

If you are building... Protocol layer / Data layer / Application layer / Framework verdict
01An internal assistant that talks to Notion, Drive, and a couple of databases MCP (Notion, Drive, DB servers) | None (no private docs to RAG over) | Custom loop, ~200 lines | No framework needed
02A RAG bot over 30–200 internal Notion or Markdown pages None (or one MCP server) | Custom: embed model + pgvector + retriever | Custom loop | No framework needed
03A RAG bot over thousands of messy PDFs (regulatory, scientific, legal) None | LlamaParse + your own retriever (or LlamaIndex query engine) | Custom loop or LangGraph if multi-step | LlamaIndex (mostly LlamaParse) earns its weight
04A research agent that browses the web and synthesizes reports MCP (Exa, web search, Firecrawl) | Optional (private corpus if any) | Custom loop for most; LangGraph if you need durable resumption | MCP + thin custom layer
05A multi-agent system with supervisors, sub-agents, and durable state MCP (whatever tools the agents need) | Whatever your retrieval needs | LangGraph | LangChain (LangGraph specifically) earns its weight
06A customer-facing chatbot with hand-off to human, escalation, transcripts MCP for CRM and ticket systems | RAG over your help center (custom or LlamaIndex) | LangGraph or a workflow engine | LangGraph is reasonable; a workflow engine like Temporal is also reasonable
07A code agent MCP (filesystem, shell, GitHub, language servers) | Code-aware retrieval (often custom) | Custom loop or Claude Code itself | Claude Code or your own loop; no framework needed
08A demo for next week's stakeholder meeting MCP for whatever | LlamaIndex if you need docs | LangChain if you want the catalog of patterns visible | Whatever ships fastest; this is a demo, not production

A pattern emerges. The cases where a framework earns its weight share a trait: durable, multi-step, multi-actor workflows with real orchestration complexity, or genuinely hard retrieval problems. The cases where the thin custom layer wins are everything else.

If you are honest about which row you are in, the choice is usually obvious. Most teams of five to twenty are in rows 1, 2, or 4. They do not need a framework. They need MCP and a thin custom layer.

11

When Each One Is the Wrong Tool

A Garden signature is naming what each option is bad at. Compressed into one section:

LangChain is the wrong tool when:

  • Your workflow is three steps and two tool calls. The framework is heavier than the problem.
  • You want to understand exactly what is in the prompt. Abstractions hide it.
  • You need a stable codebase that does not require quarterly updates to stay current. The deprecation churn is real.
  • You want your team to learn LLM application development from first principles. The framework hides the principles.
  • Your team is one or two engineers. The framework's complexity tax is paid by every engineer, every onboarding, every debug session.

LlamaIndex is the wrong tool when:

  • Your documents are simple and few. A handwritten retriever is faster to build and easier to tune.
  • You need to deeply customize chunking, embedding, retrieval, and reranking. By the time you have overridden everything, the framework is dead weight.
  • You are using it as an agent framework. It works, but LangGraph is the more focused choice if you need orchestration.
  • You want to avoid commercial lock-in entirely. LlamaParse and LlamaCloud are commercial, and they are where the project's energy is going.

MCP is the wrong tool when:

  • You need the full feature surface of a vendor's API. Many vendors expose a stripped-down MCP surface and reserve advanced features for direct API access.
  • You have one tightly integrated workflow that does not need to be reused across agent platforms. The protocol overhead is not free.
  • Your security model requires a single auditable codebase. MCP servers are separate processes, which is usually a benefit but is a different mental model.
  • You need fine-grained streaming, partial results, or unusual transport requirements. MCP supports streamable HTTP, but exotic patterns are easier in a direct integration.

The meta-mistake is treating any of these as the answer to a question they were not designed for. LangChain is not a retrieval system. LlamaIndex is not an orchestration framework. MCP is not an agent. Use each at the layer it was built for, or use none of them.

12

FAQ

Is MCP replacing LangChain? No. They operate at different layers. MCP is a protocol for how an LLM client talks to external tools. LangChain is an application framework for composing LLM-based workflows. You can use MCP from inside a LangChain app; you can also use MCP without LangChain at all. They are not competitors.

Should I learn LangChain in 2026? If you specifically need LangGraph for stateful multi-actor workflows, yes. If you are building a typical small-team agent (one workflow, a handful of tools, a private corpus), no. Learn the underlying primitives first: how an LLM API works, how function calling works, how MCP works, how embeddings and vector search work. After that, decide whether you need a framework. Most teams do not.

Should I learn LlamaIndex in 2026? If you have a messy document corpus and you need retrieval to work well over it, yes. LlamaParse alone justifies learning the surface area of LlamaIndex. If your retrieval problem is simple (small, well-structured corpus, naive vector search works), no. Roll your own.

Is MCP secure enough for production use with sensitive data? The protocol itself is fine. The security question is about which MCP servers you trust. Servers run as separate processes with whatever permissions you grant them. For production use, you want to: pin server versions, run servers in sandboxed environments, audit the code if it is open source, and limit which tools each client can see. Treat MCP servers like any other piece of third-party infrastructure.

Can I use LangChain, LlamaIndex, and MCP together? Yes. The common pattern is: LlamaIndex for the retrieval pipeline, LangGraph for the orchestration, and MCP servers wrapped as LangChain tools for external system access. This is overkill for most small teams but it is a coherent stack for larger ones. The harder question is whether the combination is justified by the workload, or whether each layer was adopted for momentum rather than need.

What is the difference between MCP and OpenAI's function calling? Function calling (OpenAI's term, also called tool use elsewhere) is the low-level mechanism for an LLM to invoke a structured function. MCP is a protocol on top of function-calling-like mechanisms that standardizes how clients and servers describe and discover tools. Function calling is "how a model calls a tool." MCP is "how an application discovers what tools exist and connects to their providers." You almost always use both: MCP discovers and connects, the underlying model's function-calling protocol does the actual invocation.

Is there an MCP equivalent from OpenAI? OpenAI's Responses API and Agents SDK expose their own tool model and have added support for connecting to MCP servers. The direction of travel in 2026 is toward MCP as a cross-vendor standard, with most major agent platforms supporting it as a client. The protocol started at Anthropic but is no longer Anthropic-specific.

Do I need a framework at all for a five-person team? Usually no. An LLM client, one or two MCP servers, and 500 lines of custom code is enough for most small-team workflows in 2026. Adopt a framework when you can name a specific problem the framework solves that you cannot easily solve yourself. "Tutorials use it" is not a specific problem.

13

Closing: Pick the Stack Instead of Inheriting One

Most small teams we meet did not pick their AI stack. They inherited it. A consultant suggested LangChain in 2023, the engineer who built the first prototype ran with it, and now there are 14,000 lines of code wrapped around abstractions nobody on the current team chose.

That is a survivable mistake but it is a mistake. The right starting question is not "what is the best framework" but "what is the minimum stack that does what we actually need, and what does the next year of changes look like?" The answer for most small teams is MCP at the protocol layer, a small retrieval layer (custom or LlamaParse-based) where it is needed, and a thin custom application layer they own and understand.

This is also the stack we use ourselves at Garden Research. Our daily AI-Science feed is a Python agent with two MCP servers and ~600 lines of code. Our Berlin culture Telegram bot is a separate Python service with seven tools, a handwritten retrieval layer over a Notion knowledge base, and zero framework dependencies. Both are in production. Both are debuggable by a single engineer in an afternoon. Both have survived two LLM provider changes and three model upgrades without rewrites.

This is not because frameworks are bad. They are not. LangGraph is a real piece of engineering. LlamaParse is genuinely excellent. We use both when the workload justifies them. The point is that the workload usually does not. And committing to a framework you do not need pays the framework's complexity tax forever without buying anything back.

If you are trying to figure out which of these three matters for your team, that is the conversation we have in a Garden audit. We map your actual workflows, name the layers, and tell you the truth about which parts of the stack are buying you something and which parts are taxes you stopped noticing. Email a@gardenresearch.eu.

This article is part of Garden Research's series on agentic AI infrastructure for small teams. If you are building, breaking, or rebuilding an AI stack and want to compare notes, we would like to hear from you.