How to Build a Model Context Protocol (MCP) Server in Python
Learn how to build an MCP server in Python to standardize and reuse AI tools, resources, and prompts across applications. This hands-on guide walks you through server setup, client testing, and GPT-4 chatbot integration for production-ready systems.
Model Context Protocol (MCP) lets you define tools, resources, and prompts just once, then expose them to any MCP-capable client out there. We're talking CLIs, agents, chatbots, you name it. Instead of constantly rewriting the same tool logic for every single application (which, let's be honest, gets old fast), you build one server that clients can discover and invoke automatically.
I'm going to walk you through building a minimal Python MCP server over stdio. We'll test it with a client and make sure everything actually works. By the time we're done, you'll have a working server that exposes math tools, static documentation, and a prompt template. You can run this in a notebook or just fire it up in your local terminal, whatever works for you.

If you want the full picture of how MCP standardizes tool and data access, check out Model Context Protocol (MCP) Explained.
Why Use MCP for This Problem
When I first started thinking about sharing tools across multiple applications, I quickly realized there were several approaches, each with their own headaches.
You could go with a shared Python package. But then every app needs to import and maintain the exact same library version. And forget about runtime discovery or schema negotiation, that's just not happening.
Maybe a bespoke HTTP microservice? Sure, but now you're dealing with network overhead and custom API design. Plus you still don't have standardized tool metadata.
What about OpenAI-native tools defined per app? Well, now you're duplicating tool definitions everywhere. You completely lose that single source of truth.
Or agent-framework-specific tools? Great until you realize you're locked into one framework API. Want to port to another framework? Time to rewrite everything.
Here's where MCP actually solves these problems:
Automatic discovery: Clients can list available tools, resources, and prompts right at runtime. No guesswork.
Standardized schemas: JSON Schema definitions mean you get consistent validation and documentation everywhere.
Transport abstraction: Whether it's stdio, SSE, or WebSocket, clients and servers negotiate capabilities without you having to build custom protocols.
Centralizing these capabilities in one server just makes sense. It removes all that repetition. Clients discover and invoke standardized functionality automatically. And honestly, if you're worried about those subtle bugs that tokenization quirks can cause, you should really read Tokenization Pitfalls: Invisible Characters That Break Prompts and RAG. You might also want to consider implementing reliable vector store retrieval for RAG.
The beauty of a single MCP server is that you have one place to update logic and metadata. One schema for validation. Consistent behavior across all your apps. But remember, LLM context isn't infinite memory. If you're planning to scale prompt sizes or chain calls, definitely check out Context Rot - Why LLMs "Forget" as Their Memory Grows.
Actually, if you're curious about comparing MCP with a workflow-first approach, take a look at building robust LLM workflows with LangChain. It's a nice complement to this pattern, showing you how to orchestrate tools and prompts in a structured pipeline.
Core Concepts for This Use Case
Before we start building anything, let me break down these MCP primitives.
Tools are basically functions that the client can invoke with typed arguments. The server executes them and sends back results. Pretty straightforward.
Resources are static or dynamic data identified by URI. Clients read them on demand as text, JSON, or binary, whatever you need.
Prompts are these reusable templates that generate messages for LLM conversations. Clients fetch them and render them with arguments.
The stdio transport is what we'll use here. Server and client communicate over standard input and output. Honestly, it's the simplest option for local development and subprocess integration.
And then there's schema negotiation. The server advertises tool input schemas with JSON Schema and resource MIME types. Clients validate input and adapt automatically, which is really nice.
If you're still getting the hang of crafting effective templates and roles, you might want to see best practices for prompt engineering with LLM APIs. It'll help you get consistent results from your MCP prompts.
Setup
First things first, make sure you have Python 3.9 or later. I always create a clean virtual environment to avoid those annoying dependency conflicts. You'll need to decide whether you're running this in a notebook or in a terminal. If you're using a notebook, just make sure you can write files to the working directory.
Install the required packages in a notebook cell or terminal:
!pip install "mcp[cli]>=0.9.0" anyio>=4.0.0 openai>=1.40.0
This gets you the MCP SDK, an async runtime, and an OpenAI client. I'd recommend pinning versions for reproducibility. If the installation fails, check your Python version and virtual environment first, then give it another shot. Oh, and if you're thinking about operating your stack outside managed services, you might want to consider deploying a self-hosted LLM server to pair with your MCP server.
Using the Tool in Practice
Build the MCP Server
Let's create a server that exposes two math tools, a static markdown resource, and a prompt template. I'm using stdio transport here because, well, it's simple.
Pick a filename for the server. I usually go with something like mcp_server.py. If you're working in a notebook, use whatever file writing utility your environment provides. Terminal users, just fire up your preferred editor.
Write the server to a file using a notebook cell:
%%writefile mcp_server.py
# Purpose: Minimal MCP server exposing math tools, a static resource, and a prompt template over stdio.
import anyio
from typing import Annotated
# MCP server APIs
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import (
PromptMessage,
TextResourceContents,
Resource,
Prompt,
)
# Instantiate the MCP server with a unique name
server = Server("calc-server")
@server.tool()
async def add(a: Annotated[int, "First integer"], b: Annotated[int, "Second integer"]) -> int:
"""
Add two integers and return the sum.
Args:
a (int): First integer.
b (int): Second integer.
Returns:
int: The sum of a and b.
"""
# Simple addition; no edge cases for int
return a + b
@server.tool()
async def subtract(a: Annotated[int, "Minuend"], b: Annotated[int, "Subtrahend"]) -> int:
"""
Subtract b from a and return the difference.
Args:
a (int): Minuend.
b (int): Subtrahend.
Returns:
int: The result of a - b.
"""
# Simple subtraction; no edge cases for int
return a - b
# Resource: static markdown documentation
DOCS_ID = "docs/usage"
DOCS_CONTENT = """# Calc Server Usage
Tools:
- add(a: int, b: int): returns a + b
- subtract(a: int, b: int): returns a - b
Prompt:
- math_helper(expression: string): step-by-step computation
"""
@server.resource(DOCS_ID, mime_type="text/markdown")
async def read_docs() -> TextResourceContents:
"""
Return static markdown documentation for the server.
Returns:
TextResourceContents: Markdown-formatted usage documentation.
"""
# TextResourceContents includes text and optional annotations
return TextResourceContents(text=DOCS_CONTENT)
@server.prompt("math_helper")
async def math_helper(expression: Annotated[str, "Math expression to compute"]):
"""
Generate a prompt for step-by-step math computation.
Args:
expression (str): Math expression to compute.
Returns:
list[PromptMessage]: System and user prompt messages.
"""
# System prompt instructs the assistant to show work
system = PromptMessage(role="system", content="You are a careful math assistant. Show your work.")
# User prompt includes the expression to compute
user = PromptMessage(role="user", content=f"Compute the following expression step by step: {expression}")
return [system, user]
async def main():
"""
Main entry point: runs the MCP server over stdio until EOF.
"""
# stdio transport: run until EOF
async with stdio_server() as (read_stream, write_stream):
await server.run(read_stream, write_stream)
if __name__ == "__main__":
anyio.run(main)
So what's happening here? The @server.tool() decorator registers async functions as tools. The client gets JSON Schema for each tool, which means it can validate arguments before calling. The @server.resource() decorator exposes static or dynamic data by URI. Clients read the content using the resource URI, and the server returns the body with a MIME type. The @server.prompt() decorator defines those reusable prompt templates I mentioned. A prompt can define named input variables, message roles, and a message sequence. The whole thing runs over stdio, reading requests from stdin and writing responses to stdout.
As you're building, keep these things in mind:
Each tool needs a clear JSON Schema for inputs. This is what helps clients validate before execution.
Each resource should return a stable URI and the correct content type. Otherwise clients won't render content correctly.
Each prompt should declare required arguments and message roles. Clients need to know what variables to supply.
Test the Server with a Client
Now let's create a separate file for the client. Call it mcp_client.py or just reuse a notebook cell. The client will start the server as a subprocess and connect over stdio.
Write a client that connects to the server, lists capabilities, calls a tool, reads a resource, and fetches a prompt:
%%writefile mcp_client.py
# Purpose: Async MCP client to test server tools, resources, and prompts over stdio.
import anyio
from mcp.client.stdio import stdio_client
from mcp.client.session import ClientSession
async def main():
"""
Connects to the MCP server, lists tools/resources/prompts, calls a tool, and fetches a prompt.
Raises:
Exception: If server connection or calls fail.
"""
# Launch the server as a subprocess and connect over stdio
async with stdio_client(["python", "mcp_server.py"]) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
# List tools and print their names
tools = await session.list_tools()
print("Tools:", [t.name for t in tools.tools])
# List resources and print their URIs
resources = await session.list_resources()
print("Resources:", [r.uri for r in resources.resources])
# Read and print documentation resource if present
for r in resources.resources:
if r.uri.endswith("docs/usage"):
content = await session.read_resource(r.uri)
# content.contents is a list of typed chunks (e.g., text, blob)
for c in content.contents:
if hasattr(c, "text"):
print("Docs:\n", c.text)
# Call 'add' tool with arguments and print the result
result = await session.call_tool("add", {"a": 5, "b": 7})
# result.content is list of output messages; pick first text
out = result.content[0].text if result.content else None
print("add(5,7) =", out)
# Fetch prompt template and print its messages
prompts = await session.list_prompts()
print("Prompts:", [p.name for p in prompts.prompts])
prompt = await session.get_prompt("math_helper", {"expression": "3*(4+2)"})
print("Prompt messages:", [(m.role, m.content) for m in prompt.messages])
if __name__ == "__main__":
anyio.run(main)
What this client does is launch the server as a subprocess, establish a session over stdio, and exercise all three primitives. It lists tools, resources, and prompts. Calls a math tool to confirm schema validation and execution work. Reads a markdown resource to confirm URI resolution and content types are right. And fetches a prompt and renders it with arguments. The stdio_client context manager handles all the process lifecycle and stream wiring stuff. If the server crashes, you'll see an error during capability discovery or the first call. Just check the server logs or print statements to figure out what went wrong.
Run and Evaluate
Time to run the client and verify the server works end to end. If you're using a terminal, run the client from the same virtual environment. Notebook users, just run the cell that executes the client code.
!python mcp_client.py
Expected output:
Tools: ['add', 'subtract']
Resources: ['docs/usage']
Docs:
# Calc Server Usage
Tools:
- add(a: int, b: int): returns a + b
- subtract(a: int, b: int): returns a - b
Prompt:
- math_helper(expression: string): step-by-step computation
add(5,7) = 12
Prompts: ['math_helper']
Prompt messages: [('system', 'You are a careful math assistant. Show your work.'), ('user', 'Compute the following expression step by step: 3*(4+2)')]
This output tells us that:
The client successfully discovered tools, resources, and prompts during session setup.
The tool call returned a valid result that matched the expected schema.
The resource read returned the expected content with the correct MIME type.
The prompt rendered correctly with your input arguments.
If something fails, check your Python path, virtual environment, and file locations. Make sure your server file is actually executable in your environment. Windows users, confirm that your Python launcher is running the correct interpreter. macOS or Linux folks, check permissions and the working directory.
Optional: Add Resilience to Tool Calls
For production use, you really want to add a timeout and error handler around tool calls. This prevents a slow or failing tool from blocking the client. Plus you'll capture and surface meaningful error messages.
Create a helper function:
%%writefile helpers/safe_call.py
# Purpose: Utility for safe, timeout-guarded MCP tool calls.
from anyio import fail_after, WouldBlock
import logging
async def safe_call_tool(mcp, name, args, seconds=10):
"""
Call an MCP tool with a timeout and error handling.
Args:
mcp: MCP client session.
name (str): Tool name.
args (dict): Tool arguments.
seconds (int): Timeout in seconds.
Returns:
Tool call result or None if failed/timed out.
Raises:
None: All exceptions are caught and logged.
"""
try:
with fail_after(seconds):
return await mcp.call_tool(name, args)
except WouldBlock:
# Timeout occurred
logging.warning(f"Tool call to {name} timed out after {seconds}s.")
return None
except Exception as e:
# Log error and return None
logging.error(f"Tool call {name} failed: {e}")
return None
Use safe_call_tool instead of calling session.call_tool directly. I'd start with a conservative timeout. Only expand it if you confirm that a tool actually needs more time. Always log errors and return structured error information to whoever's calling.
Troubleshooting
Let me save you some time with common issues I've run into:
The client cannot discover tools: Make sure the server process actually starts correctly. Check that the stdio transport isn't buffered or redirected somewhere weird. Confirm the working directory contains the server file.
JSON Schema validation errors: Double-check that your tool input matches the schema. Look at required fields, types, and enum values. If your arguments are valid but the schema is too strict, update the tool schema.
Resource not found: The resource URI needs to match exactly. If the server builds resource URIs dynamically, print the final URI and compare it to what the client is requesting.
Prompt rendering errors: You need to pass all required variables. If your template variables change, update both the server prompt declaration and the client call.
Hanging calls: Add timeouts using that resilience pattern I showed you. If a tool depends on external services, handle network retries and backoff inside the tool implementation itself.
Version conflicts: Pin your dependencies and reinstall in a fresh virtual environment. Make sure the client and server are using compatible MCP SDK versions.
Conclusion
So there you have it. You've built a minimal MCP server that exposes tools, resources, and prompts over stdio, and you've verified it works with a test client. This pattern lets you define capabilities once and reuse them across any MCP-compatible application. You now have a repeatable way to standardize tool access without constantly rewriting code for each app.
Next Steps
Here's what I'd tackle next:
Integrate with OpenAI function calling. Convert MCP tools to OpenAI tool schemas and route calls from GPT-4 to your server. There's a separate guide for this. For deeper customization, check out the step-by-step guide to fine-tuning large language models.
Add dynamic resources: Instead of just static markdown, serve real-time data like database queries or API responses.
Explore other transports: Try SSE or WebSocket for remote or browser-based clients.
Package and deploy: Containerize your server and expose it through a network transport for multi-user access.