Why We Chose Python for Our MCP Servers (And Why That Decision Matters)

A data-driven argument for principled language selection in agent tool ecosystems

About eight months ago, our internal AI agent ecosystem had a problem that wasn’t on anyone’s radar yet. We had accumulated fifteen MCP servers across three programming languages — -Python, Rust, and TypeScript — -and not one of us could explain why we’d made those choices. The Rust servers existed because “Rust is fast.” The TypeScript server existed because a required library demanded it. The Python servers existed because that’s where we actually got work done.

Six of those fifteen servers were health-check stubs with no domain logic. We were maintaining a Rust toolchain — -4 GB of Visual Studio Build Tools, a 700 MB Rust installation, 3–8 minutes of compile time on every fresh clone — -for servers that did nothing but return {“status”: “ok”}.

This is the story of how we fixed that, and why the pattern is repeating itself across every team building on MCP.

The Problem No One Names Until It’s Too Late

When you build your first MCP server, you don’t think about language policy. You pick the language you know, or the language where the required library lives, or the language your colleague is familiar with. This works fine for server number one.

By server five, you have a problem.

Each language runtime you add to your ecosystem imposes a fixed cost on every developer: install Rust toolchain, install Node.js, install Python, learn each SDK’s idioms, debug each language’s quirks. Those costs don’t scale linearly — -they compound. And unlike a monolithic application where you make one language choice once, an MCP ecosystem makes language choices continuously, every time you need a new tool.

The specific failure modes we observed:

Stub accumulation. We scaffolded Rust servers for medium, discord, linkedin-rpa, and graph-unified because we planned to need them eventually. Months later, they were still health-check stubs. They appeared in every configuration profile, added compile time to every build, and created entries in every policy file — -before delivering a single working tool. It’s the sunk-cost fallacy at the infrastructure level.

Toolchain sprawl. The Rust workspace required Visual Studio Build Tools 2022 with the C++ workload. On Windows, that means a 4 GB installer just to get link.exe. One contributor’s onboarding took 45 minutes before they could run cargo check. For a single server, that’s not a trade-off — -it’s a tax.

Configuration drift. As servers accumulated, the configuration file diverged from reality. Servers were listed but disabled. Policy entries existed for servers that predated the capability model. Nobody could confidently answer “which servers are actually active right now?” without reading three files.

The Decision Framework

We didn’t want to make an arbitrary “pick Python” proclamation. We wanted a decision we could justify, revisit, and explain to a future contributor who disagreed with it. So we encoded it in an Architectural Decision Record (ADR) — -a structured document that captures context, alternatives considered, scoring rationale, and consequences.

The framework scores each language candidate on four axes:

Weighted score = (velocity × 0.40) + (reproducibility × 0.30)
 + (ecosystem × 0.20) + (performance × 0.10)Weighted score = (velocity × 0.40) + (reproducibility × 0.30)
               + (ecosystem × 0.20) + (performance × 0.10)

Here’s why those weights make sense for a personal-productivity or small-team MCP context:

Implementation velocity (40% weight) is highest because MCP servers in this workload class are predominantly thin API proxies. You’re wrapping arXiv’s search API, proxying Semantic Scholar, orchestrating a publishing pipeline. The time from “I need this tool” to “the tool works” is the primary bottleneck. A language that costs you boilerplate or weak SDK documentation directly costs you throughput.

Reproducibility (30% weight) is second because the cost of a bad first-run experience isn’t just time — -it’s trust. When a new contributor (or your own future self, six weeks after a vacation) clones the repository and can’t get a server running without a 45-minute install sequence, they’ll reach for the path of least resistance rather than the correct one. Reproducibility problems compound into ecosystem entropy.

Ecosystem fit (20% weight) captures whether the libraries you need exist and are maintained in this language. A language that requires you to write your own HTTP client for every API call is not a good fit for API-proxy workloads.

Performance (10% weight) is lowest because of a specific property of MCP servers: they are launched on-demand by the host (VS Code, in our case) and handle serial JSON-RPC requests over a stdio pipe. There is no concurrent load. The performance difference between a 5 ms Rust startup and a 300 ms Python startup is below human perception in an interactive agent workflow.

Scoring Each Language

Let’s run through the numbers. Each axis scored 1–5, 5 = best.

The decision rule falls out directly: Python primary, TypeScript only for Node-native dependencies, Rust only for CPU-bound compute. Anything that doesn’t meet those conditions goes to Python by default.

Python: 4.8 / 5

# The entire hello-world MCP server in Python:
from mcp.server import Server
from mcp.types import TextContent, Tool
from mcp.server.stdio import stdio_server
import asyncio

server = Server("my-server")

@server.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="hello",
            description="Say hello.",
            inputSchema={"type": "object", "properties": {
                "name": {"type": "string"}
            }, "required": ["name"]}
        )
    ]

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    if name == "hello":
        return [TextContent(type="text", text=f"Hello, {arguments['name']}!")]
    raise ValueError(f"Unknown tool: {name}")

async def main():
    async with stdio_server() as (r, w):
        await server.run(r, w, server.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

That’s a working MCP server. No Cargo.toml, no trait implementations, no derive macros. Add a tool: add a branch to calltool.

The mcp pip package is first-party from Anthropic, meaning it’s kept in sync with the protocol specification. The ecosystem covers every API domain that matters for productivity workflows: httpx for HTTP, scholarly for academic APIs, pywin32 for Windows COM automation (Outlook, Excel), notion-client for Notion, discord.py for messaging.

The onboarding sequence: pip install -r requirements.txt. On a machine with Python 3.11+ installed, that’s under 2 minutes.

Velocity: 5.
Reproducibility: 5.
Ecosystem: 5.
Performance: 3.
Weighted: 4.8.

Rust: 2.2 / 5

Rust is a genuinely excellent language. For systems programming, latency-critical paths, and CPU-bound computation, it earns its complexity. But for MCP server development in a personal-productivity context, it fails nearly every criterion:

# Just to add one tool to a Rust MCP server, you need:
[dependencies]
rmcp = { version = "1.2", features = ["server", "transport-io"] }
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
anyhow = "1"

Then you implement a struct for your server, add #[toolrouter] and #[tool] derive macros, write the ServerHandler trait implementation, configure the async runtime, and handle the stdio transport setup. About 50 lines of boilerplate before your first tool works.

The SDK situation makes this worse. rmcp (the Rust MCP SDK) is community-maintained, at version 1.2, with documentation that assumes familiarity with both Rust async patterns and MCP’s internal model. When it works, it works well. When you hit an edge case, you’re reading source code.

The fatal issue for our context was the prerequisite chain on Windows. Rust’s Windows target requires the MSVC linker, which requires Visual Studio Build Tools 2022 with the C++ workload. That’s a 4 GB download and a 10–20 minute install sequence. Every new environment, every CI run, every colleague’s first day — -pays this toll.

Velocity: 2.
Reproducibility: 2.
Ecosystem: 2.
Performance: 5.
Weighted: 2.2.

TypeScript: 4.2 / 5

TypeScript is the right choice when the required library is Node-native with no Python equivalent. In our ecosystem, that means exactly one server: whatsapp-ui, which depends on whatsapp-web.js — -a library that automates WhatsApp Web via Puppeteer, available only in the npm ecosystem.

const transport = new StdioServerTransport();

await server.connect(transport);// TypeScript MCP server with Zod schemas

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new McpServer({ name: "my-server", version: "1.0.0" });

server.tool(
  "hello",
  "Say hello.",
  { name: z.string().describe("Name to greet") },
  async ({ name }) => ({
    content: [{ type: "text", text: `Hello, ${name}!` }]
  })
);

const transport = new StdioServerTransport();
await server.connect(transport);

The @modelcontextprotocol/sdk is also first-party from Anthropic, with excellent TypeScript types and Zod integration for schema validation. The nodemodules problem is real but manageable. The onboarding sequence is npm install && npm run build — -under 3 minutes.

TypeScript scores almost as well as Python on everything except there’s one extra friction point: Node.js must be installed separately, and nodemodules trees require periodic cleanup. Not dealbreakers, but they add up across many servers.

Velocity: 4.
Reproducibility: 4.
Ecosystem: 5.
Performance: 3.
Weighted: 4.2.

From the scoring, the policy is clear:

1. Python is the primary language for all new MCP servers. Use it unless a required dependency is genuinely Node-native (no viable Python port at comparable maturity).

2. TypeScript is the secondary language. Use it only when you have a Node-native dependency. Document the dependency explicitly in the server’s README so future contributors understand it’s a deliberate exception, not a pattern to extend.

3. Do not write new Rust MCP servers unless you have a CPU-bound, compute-intensive workload (parallel file I/O, SHA-256 hashing, binary distribution). Even then, justify it in an ADR. Our code-intel-rs server is retained because it does parallel directory walking with rayon and Merkle-tree hashing — -workloads that genuinely benefit from Rust’s data-parallelism model. Everything else was retired.

The decision flow looks like this:

The Rationalization Outcomes

Before applying the framework: 15 servers, 6 stubs, 3 language runtimes, 4 GB prerequisite toolchain. The Rust code-intel-rs server is distributed as a pre-built binary. New contributors don’t need the Rust toolchain.

The two Rust API proxies (arXiv and Semantic Scholar) were ported to a Python research-server in about 3 hours. The four stubs were deleted. The Python research-server ended up at 172 lines of code and exposes 5 tools — -covering both APIs that previously required two separate Rust servers.

What makes this pattern repeatable

The decision framework isn’t just about Python vs. Rust vs. TypeScript. It’s a template for explicit language governance in any multi-language tool ecosystem — -MCP or otherwise.

The key properties:

Weights should reflect your operational context. A team with a dedicated DevOps pipeline and reproducibility infrastructure can weight performance higher and reproducibility lower. A solo developer on Windows should do the opposite.

Decisions should have explicit rationale. An ADR that says “Python wins because we like Python” is better than no ADR. An ADR that traces the scoring to context-specific factors is better still — -because when the context changes, you know which weights to revisit.

Stubs are technical debt with a toolchain cost. A health-check stub in Rust costs you the entire Rust toolchain. Delete them or implement them; don’t let them exist in a middle state.

The primary language absorbs new servers by default. Requiring explicit justification for exceptions (TypeScript: whatsapp-web.js; Rust: CPU-bound compute) creates a strong prior against fragmentation without prohibiting legitimate specialization.

Next Steps

If you’re building an MCP ecosystem and facing the same language sprawl, the conversation between the three SDKs is worth having before you’ve committed to all of them:

- Anthropic Python SDK
- Anthropic TypeScript SDK
- rmcp (Rust community SDK)

The next post in this series covers a different but related problem: once you’ve decided on your languages, how do you manage the fact that different workflows need different servers active — -and turning everything on all the time is a governance problem, not just a performance one.

This post is part of a three-part series on production MCP ecosystem engineering.

The full technical analysis is available as a research paper: “Language-Appropriate Specialization in Agent Tool Ecosystems,” targeting arXiv cs.SE.

Why We Chose Python for Our MCP Servers (And Why That Decision Matters) was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.