Safe, Strict, & Scalable: Why I Rewrote the Model Context Protocol (MCP) with gRPC
From "Guesswork" to "Guarantees": Hardening AI infrastructure with gRPC and Protobufs. [Debdipta Halder | AI Researcher, FireCompass]
The Model Context Protocol (MCP) is having its moment. It promises to be the “USB-C for LLMs”—a standard way to connect AI models to our local files, databases, and tools.
But as I moved from building toy prototypes to an enterprise-grade Context Server in Python, I hit a wall.
The standard implementation of MCP relies on JSON-RPC over Stdio (standard input/output) or SSE (Server-Sent Events). While this is fantastic for rapid plugin development, it left me nervous about two critical things: Security and Type Safety.
Here is a polished, production-ready draft for your Substack post. It builds on your intro, sets the technical stakes, and seamlessly weaves in the FireCompass context while maintaining that authoritative, battle-tested engineering tone.
The Great Protocol Swap: Why I Ditched JSON-RPC for gRPC in My Model Context Server
If you are building infrastructure for Large Language Models, you already know the golden rule: context is everything. But how you deliver that context is just as critical.
Recently, while architecting a custom Model Context Protocol (MCP) server as part of the FireCompass Research Initiative, I hit a wall with standard practices. I realized that if I wanted to build a robust, secure context server, I didn’t just need a data pipe; I needed a strict contract.
So, I ripped out the JSON-RPC layer and replaced it with gRPC.
Here is why I did it, and why—if you are building for production—you might want to consider it too.
The Tale of the Tape: JSON-RPC vs. gRPC
Before we dive further, it is important to understand strictly how these two protocols handle data, because that is where the security gap lies.
In a nutshell: JSON-RPC trusts the developer to validate the data. gRPC trusts the schema to validate the data.
The Devil’s Advocate: Why Does Everyone Still Use JSON-RPC?
If gRPC is safer, why do tools like VS Code, Claude Desktop, and Zed default to JSON-RPC over Stdio?
It isn’t because they don’t value security. It’s because they value ubiquity and simplicity.
Lowest Common Denominator: To use JSON-RPC, you don't need a protocol compiler or generated code. You just print a string to
stdout. Every language, from Bash to Python, can do this.The LSP Heritage: MCP is the spiritual successor to the Language Server Protocol (LSP), which powers your IDE's "Go to Definition" features. LSP was built on JSON-RPC to lower the barrier for plugin developers.
For a desktop tools like IDEs or local chatbots meant to run locally, JSON-RPC is fine. But I am building a server that needs to stand up to complex, potentially malicious inputs. In that environment, the "easiness" of printing text wasn't worth the risk.
The Bottleneck: The “Type Confusion” Vulnerability
The breaking point for me was realizing how susceptible standard Python JSON handlers are to Injection Attacks via type confusion.
In a standard implementation, the server receives a JSON payload. Python’s json.loads() converts this into a dictionary. Because JSON is dynamic, an attacker (or a hallucinating LLM) can send a dictionary instead of an integer. Let us try to understand the situation to the following scenarios.
Scenario: Imagine an MCP tool that looks up user details. You expect an Integer ID.
# VULNERABLE PYTHON HANDLER
def get_user_data(params):
# DANGER: We assume user_id is an int, but JSON allows it to be anything.
# If we pass this directly to a NoSQL DB (like Mongo):
return db.users.find_one({"id": params["user_id"]})The Attack: An attacker sends this valid JSON-RPC payload:
{
"method": "get_user_data",
"params": { "user_id": { "$ne": null } }
}The Result: Instead of looking for User 105, the database receives {"id": {"$ne": null}}. This translates to “Find the first user where ID is not null.” The attacker instantly retrieves the admin’s account, bypassing the ID check entirely.
To fix this in JSON-RPC, you have to write defensive code (Pydantic models, manual type checks) for every single field.
The Solution: Schema-First Security with gRPC
To solve these issues, I realized I needed a fundamental shift in approach.
With JSON-RPC, you are constantly building barricades and guardrails after you imagine what could go wrong. You are perpetually reacting to endless possibilities of bad data.
I wanted to be proactive. Instead of reacting to issues, it is far better to implement a strict contract before communicating. This ensures the communication is unambiguous from byte one, and both parties—the AI client and the Python MCP server—are crystal clear on exactly what data format is being exchanged.
To achieve this, I turned to gRPC (Google Remote Procedure Call). By defining my MCP service in a .proto file, I instantly gained that strict contract.
The Protobuf Definition:
syntax = "proto3";
service MCPService {
rpc CallTool (ToolRequest) returns (ToolResponse);
}
message ToolRequest {
string tool_name = 1;
// We define strictly typed arguments
int32 user_id = 2;
}Why This is Secure: If an attacker tries to send that malicious JSON object ({ "$ne": null }) as the user_id:
Rejection at the Gate: The gRPC deserializer sees that the input does not match the wire type for
int32.Hard Failure: The request fails to deserialize before it ever reaches my Python function.
Safety: My handler code never runs, and the database query is never attempted.
In this design:
The LLM can request actions
It never executes them directly
All tools are accessed through a controlled service layer
Tool execution is isolated from the model, reducing the risk of prompt injection, unsafe commands, or accidental privilege escalation.
Why Not REST or GraphQL?
It wasn’t just a coin toss between JSON-RPC and gRPC. I evaluated the usual suspects, and here is why they didn’t make the cut.
Why not REST? REST is resource-oriented (”Give me the User resource”). MCP is action-oriented (”Run this tool”). Trying to shoehorn command execution into REST verbs feels clunky. Furthermore, LLM interactions are increasingly streaming-heavy, and gRPC handles bidirectional streaming natively better than REST.
Why not GraphQL? GraphQL is excellent for preventing over-fetching data, but it introduces a massive query engine overhead. I didn’t need a query language to traverse a graph; I needed a fast, secure, simple pipe to execute functions.
The Implementation
I created a custom MCP server built on gRPC that supports Hot-Reloading. You can drop a new Python script into a folder, and the server hot-swaps it into the running process instantly.
The Architecture
The system consists of three main components:
The Protocol: A strict Protobuf definition ensuring the AI client knows exactly what inputs/outputs to expect.
The Watchdog: A background thread monitoring the file system for changes.
The Registry: A dynamic module loader that bypasses Python’s standard import cache to reload code on the fly.
The Data Flow:
Developer creates
my_tool.py.watchdogdetects thefile_createdevent.Server uses
importlibto load the module into memory.Server pushes a notification stream to connected clients.
Clients refresh their tool definitions immediately.
The Protocol (the .proto file)
We define a service that supports both standard calls and a streaming endpoint for updates.
syntax = "proto3";
package switchblade;
service SwitchbladeService {
// Discover available tools
rpc ListTools (Empty) returns (ListToolsResponse);
// Execute a specific tool
rpc CallTool (CallToolRequest) returns (CallToolResponse);
// Stream updates (Server pushes a message when files change)
rpc WatchTools (Empty) returns (stream ToolsNotification);
}
message Empty {}
message Tool {
string name = 1;
string description = 2;
string input_schema_json = 3;
string output_schema_json = 4;
}
message ListToolsResponse {
repeated Tool tools = 1;
}
message CallToolRequest {
string tool_name = 1;
string arguments_json = 2;
}
message CallToolResponse {
string content_json = 1;
bool is_error = 2;
string error_message = 3;
}
message ToolsNotification {
string event_type = 1;
string message = 2;
}The “Secret Sauce”: Hot-Reloading Logic
The core challenge is reloading Python code without restarting the interpreter. We achieve this using importlib.util and the watchdog library.
The Logic: Instead of import tool, we manually create a module specification from the file path. This allows us to overwrite the existing module in sys.modules whenever a file modification event occurs.
The Server Implementation (server.py)
import os
import grpc
import json
import importlib.util
import sys
import queue
import threading
import inspect
from concurrent import futures
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import switchblade_pb2
import switchblade_pb2_grpc
TOOLS_DIR = "./tools"
class ToolRegistry:
def __init__(self):
self.tools = {}
self.subscribers = []
self.lock = threading.Lock()
def load_tool_file(self, filepath):
"""Dynamically loads a python module and scans for @tool decorated functions."""
module_name = os.path.basename(filepath).replace(".py", "")
# Create a spec from the file location directly
spec = importlib.util.spec_from_file_location(module_name, filepath)
if spec and spec.loader:
try:
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module # Overwrite system cache
spec.loader.exec_module(module)
# Scan for functions decorated with our SDK
for name, obj in inspect.getmembers(module):
if inspect.isfunction(obj) and getattr(obj, "_is_switchblade_tool", False):
meta = obj._tool_metadata
with self.lock:
self.tools[meta["name"]] = obj
print(f"✅ Hot-Loaded: {meta['name']}")
self.notify_subscribers(f"Tool {meta['name']} updated")
except Exception as e:
print(f"❌ Load Error: {e}")
def notify_subscribers(self, message):
"""Notify all connected clients via gRPC stream"""
for q in self.subscribers:
try:
q.put(switchblade_pb2.ToolsNotification(event_type="UPDATED", message=message))
except:
pass
class ToolFileHandler(FileSystemEventHandler):
"""Watches the /tools directory for changes"""
def __init__(self, registry):
self.registry = registry
def on_modified(self, event):
if event.src_path.endswith(".py"):
self.registry.load_tool_file(event.src_path)
def on_created(self, event):
if event.src_path.endswith(".py"):
self.registry.load_tool_file(event.src_path)
# ... (Standard gRPC Service Implementation Omitted for Brevity) ...
A Simple Decorator
To make writing tools easy, we create a small SDK that marks functions for the registry to pick up.
def tool(name, description, input_schema, output_schema=None):
def decorator(func):
func._is_switchblade_tool = True
func._tool_metadata = {
"name": name,
"description": description,
"input_schema": input_schema,
"output_schema": output_schema or {}
}
return func
return decoratorPutting it Together: An Example Tool
Because we decoupled the server from the logic, adding a tool is as simple as dropping this file into the tools/ folder.
import platform
import psutil
from switchblade import tool
@tool(
name="get_system_stats",
description="Returns CPU and RAM usage of the host server.",
input_schema={"type": "object", "properties": {}}, # No input needed
output_schema={
"type": "object",
"properties": {
"cpu_percent": {"type": "number"},
"ram_percent": {"type": "number"},
"os": {"type": "string"}
}
}
)
def get_system_stats(args):
return {
"cpu_percent": psutil.cpu_percent(interval=1),
"ram_percent": psutil.virtual_memory().percent,
"os": platform.system()
}The project is named Switchblade, a custom MCP server built on gRPC that supports Hot-Reloading. You can drop a new Python script into a folder, and the server hot-swaps it into the running process instantly.
Conclusion
Standard MCP is a brilliant innovation for getting the ecosystem started. But as we move from “chatting with files” to “agents executing business logic,” our infrastructure needs to mature.
The original Model Context Protocol (MCP) introduced a useful idea: let LLMs interact with external tools in a structured way.
But in practice, it often blurred boundaries between reasoning, orchestration, and execution — creating safety, scalability, and maintainability issues.
This redesign adheres to one simple principle:
Treat the LLM as a planner, not an executor.
Reasoning, orchestration, and tool execution should be clearly separated and connected only through strict, typed interfaces.
This architecture is LLM-agnostic by design. The client can use any LLM as per their requirement. In fact the client in my project SwitchBlade uses a local hosted LLM model.
I traded the ease of print() for the rigor of Protobufs, and in exchange, I got a system that is harder to break, easier to maintain, and significantly more secure.
If you are building an MCP server that touches sensitive data, it might be time to stop parsing strings and start defining schemas.
Acknowledgments
A massive thank you to Arnab Chattopadhayay for the invaluable support and rigorous inputs provided during the testing phases of this MCP server.
I would love to hear how others are handling context delivery and agent architecture in production and what are your thoughts on the above approach. Have you hit the limitations of standard JSON-RPC? Please share your thoughts and comments below !
Check out the full implementation on GitHub here: SwitchBlade


The post is very clearly articulated. You have identified a subtle but very important area of MCP which the industry tends to ignore - which is focussing on security. MCP has moved passed its original objective and is being used in all kind of critical functions. So at this point, any serious organization must demand security from MCP environment. The work described here leads a way towards that. Excellent work.