Den här sidan är för närvarande inte tillgänglig på ditt språk.

A deep dive into MCPs, Part 1: What is MCP, anyway?

If you follow AI, you’ve almost certainly heard about Model Context Protocol (MCP). Since Anthropic announced this new standard in November 2024, its popularity has exploded. There are MCPs for sending emails, MCPs to search the web, MCPs that run other MCPs. MCPs are letting humble LLMs make pull requests and plane reservations. In May, Microsoft even announced an MCP Registry for Windows that lets apps expose functionality to AI agents

Amazing! But how? What is MCP? How do you use it? Why has it become so much more popular than earlier efforts? How does it let chatbots do things in the real world? And how can you build one of your very own?

When we decided to build our own MCP server for the DeepL API, it seemed intimidating. But once you truly understand how MCPs work, how MCPs let agents discover and use tools, and how you can build a basic MCP server with 10 lines of code, it’s not so scary. In this series of posts, we will endeavor to demystify this topic. We will explore:

  • what MCP is
  • how to use MCP in an AI client
  • how to build an MCP server

Let’s go!

What does MCP mean?

MCP stands for Model Context Protocol.

The “model” is an AI model. This can be any sort of AI surface, but typically it’s built on an LLM. Here we’ll refer to this as an AI agent or an AI assistant.

Context” refers to the context in which the AI agent operates. In AI, we often talk about a “context window”, which refers to all the context an LLM incorporates when it generates its response to a prompt - the conversation history, the system prompt, and more. For MCP to work, an AI surface gives the underlying LLM an additional prompt specifying what tools it can use and how to use them.

Protocol” is a set of rules that describe a way two entities communicate over a network. One protocol you use every day is HTTP - HyperText Transfer Protocol - which governs how a web server and web client communicate.

Putting this together: “Model Context Protocol” is a protocol that provides the context that lets AI models use outside tools and resources.

MCP is an open standard invented by two Anthropic engineers. It uses JSON-RPC, a popular protocol for defining function calls. And it’s inspired by Language Server Protocol, which Microsoft created to decouple the logic behind handling coding languages from IDEs.

MCP is not a piece of software, an app, or an API. But it gives AI agents access to all of these!

What does MCP do?

LLMs can give intelligent and useful responses to user input. But they have no access to outside services, no way to talk to other software, no means to retrieve up-to-date information. MCP gives any AI surface the ability to send requests to any software service and get a response. It gives an AI agent a standard means to access outside tools, resources, and data.

Without MCP, an AI client is just a brain with no body, no way to access the outside world. With MCP, an AI client can be an agent!

Who can use MCP?

Since MCP is an open standard, anyone can implement an MCP client or an MCP server.

The client can be any app based around an LLM. You can use MCPs in apps like Claude Desktop, VSCode, Windsurf and Cursor. Or you can use MCPs with an AI agent you’ve built with a framework like LangChain or CrewAI.

Why has MCP become so popular?

One reason is that it’s an open standard. Any AI agent can implement MCP, and so can any tool. This gives it wider appeal than solutions for a specific company’s product, like OpenAI’s Function calling.

In creating this universal solution, MCP’s creators were inspired by Language Server Protocol (LSP). Before LSP, when an IDE wanted to implement functionality like autocompletion or syntax highlighting for a given coding language, it needed to roll its own integration. The result was what the creators recently referred to as an m × n problem, in which people needed to implement language services separately for each possible IDE/language combination. With LSP, each coding language needs only one integration library. And an IDE which supports LSP can support any such language. Similarly, once a service implements an MCP server, any AI client which supports MCP can use it. The client needs only to support MCP.

MCP is flexible in other ways. It can run locally or over a network. And it comes from Anthropic, who is well-respected and who would share the benefits of an open standard.

Why do AI agents need a standard protocol?

Since LLMs are so good at understanding and applying information, can’t they already use any software service whose description their training data includes? After all, LLMs already understand how to use many popular APIs, as you’ll find if you ask a coding assistant like Copilot or Sourcegraph for help.

But if you gave an LLM access to every API in its training data, chaos would ensue. How would it know which APIs to call and when? How would it get your API key and credentials, and how would it use services that need other sorts of access? What if you wanted to know the driving distance from Kansas City to Des Moines - or the current temperature in Bangkok? There are many ways to find this information. As a user, you want control over the actions your LLM chooses. This is especially true when it isn’t retrieving information, but taking actions that affect the world, like sending an email or ordering you a pizza.

True, an agent could read and understand a low-level OpenAPI spec, or the entirety of an API’s documentation, then make API calls accordingly. But MCP lets toolmakers define higher-level services that would be especially useful for an AI agent, pointing that agent in a useful direction. So far, this is an area where humans still excel.

Thus, it turns out that the best way for now is to explicitly define tools and resources the LLM can use and how it can use them. MCP does this the old-fashioned way, deterministically - not by having the LLM do its magic, but through code and JSON.

But how can an LLM use tools?

Fundamentally, an LLM is a machine which generates tokens - tokens that humans encounter as text, images, audio, and so on. So how can such a machine have access to tools and resources? How does it know it has such access, and how does it know which one to use?

The key is that, indeed, as Maximilian Schwarzmüller points out, LLMs can only communicate with the outside world with tokens. They take tokens as input and produce tokens as output. And those tokens can include entities that mean “I want to use a tool.” So, to indicate that it wants to use a tool or resource, and to actually use one, an LLM generates standard tokens that the software surrounding it will recognize as a tool call and intercept. That software then calls a service and returns the output to the LLM.

The LLM could use specialized tokens here. But usually it just uses specific delimiters and language - that way you can retrofit this ability into an existing LLM by fine-tuning it. For example, you could use start and end tags in square brackets that enclose JSON, like this sample agent output:

Sure! I can create a new spreadsheet for you. Hang on…

[tool-start] { "tool": "spreadsheets", "action": "create" } [tool-end]

This is just what researchers did in their 2023 Toolformer paper, which showed, as the title says, “Language Models Can Teach Themselves to Use Tools”. In this study, the authors generated training data which included plain-text representations of API calls and their results. Fine-tuning an LLM with this additional data, they produced an LLM which would call those APIs to generate a promising response to a prompt.

They used the simple format:

[function_name(data) → result]

For example, knowing that LLMs struggle with math, they provided a Calculator tool. They then included in the training data strings like this:

Out of 1,400 participants, 400 (or [Calculator(400 / 1400) → 0.29] 29%) passed the test.

and

The name derives from “la tortuga”, the Spanish word for [MT(“tortuga”) → turtle] turtle.

By fine-tuning an LLM with such data, they created an LLM which knew how to call APIs to generate the most promising response to a given prompt.

But you don’t have to fine-tune an LLM to let it use tools. It turns out that all you need is a prompt. This article demonstrates how to teach an LLM to use tools using JSON delimited by [[qwen-tool-start]] and [[qwen-tool-end]], solely by giving it an appropriate prompt. You can try this yourself! Just visit your favorite chatbot and tell it:

Congratulations! You now have the ability to use a tool to do multiplication. When you need to do a calculation, all you need to do is output the following syntax: 

[Math] { "arg1": {value}, "arg2": {value}    } [/Math] 

where, of course, each {value} represents a value you want to multiply. Got it? 

Then give the LLM a math problem and watch it faithfully output the syntax you gave it for your math tool.

How does an LLM handle output from a tool?

How does an LLM receive tool output? Once again, it’s all about tokens. Once the AI client receives output from a tool, it feeds that back into the LLM as input, but, presumably, in such a way that the LLM knows it is tool output, not end user input.

Putting it all together

To summarize, here’s what happens when an LLM uses a tool:

  • User types in input
  • LLM processes this input
  • While generating output, the LLM “decides” calling a tool is the best way to complete its output. This means it outputs a set of tokens that it’s been trained to know will let it call a tool.
  • The AI client recognizes those tokens as something that should call a tool, intercepts it, parses the JSON into parameters and anything else in the request, and sends this off to the appropriate tool
  • The tool generates output, which may or may not involve using an API. It returns this to the AI client, which sends it to the LLM
  • LLM processes tool output
  • LLM uses this output to continue to create a response for the user

And that’s my whirlwind tour of what MCP is and how it’s implemented in LLMs. In my next post, I’ll show you how to use MCPs in your very own AI client!


About the Author

As DeepL’s Developer Evangelist, Ben Morss works to help anyone use DeepL’s API to access its world-class AI translations. Previously, at Google, he was a Product Manager on Chrome and a Developer Advocate for a better web. Before that he was a software engineer at the New York Times and AOL, and once he was a full-time musician. He earned a BA in Computer Science at Harvard and a PhD in Music at the University of California at Davis. You might still find him making music with the band Ancient Babies, analyzing pop songs at Rock Theory, and writing a musical that’s not really about Steve Jobs.

Dela