---
title: 'MCP server monitoring: why an HTTP 200 isn''t enough'
description: 'Your MCP server can return a perfectly healthy 200 and still be broken for every agent that depends on it. Here''s why that happens, and how to actually keep an eye on the thing.'
date: '2026-06-15'
author: 'Drew Post'
tags: ['mcp', 'synthetic-monitoring', 'opentelemetry', 'ai-infrastructure']
canonical_url: 'https://yorkermonitoring.com/blog/mcp-server-monitoring'
---

Quick question. If you're running an MCP server in production right now, can you tell me it's actually working? Not that it's up. That it's *working*. That the handshake still completes, that the tools your agents rely on are still there, that nobody quietly changed a schema underneath you last night.

If you can't answer that with much confidence, you're in good company. Most teams running MCP servers aren't really monitoring them yet, and honestly a lot of the people I talk to hadn't clocked that you even can monitor them properly. So that's what I want to get into here: why an MCP server needs more than an uptime check, and what watching it properly actually looks like.

Cards on the table first. I build Yorker, a synthetic monitoring tool, and MCP checks are one of the things it does, so I'm not a neutral party here. But the problem is real whether you ever use us for it or not, so let me walk through the problem and you can make your own mind up.

![An HTTP 200 is not an MCP session. An uptime monitor sees one request return 200 OK, while the real session runs four phases that can each break on their own.](/blog/mcp-server-monitoring/01-http-200-vs-mcp-session.svg)

## A 30-second refresher on MCP

[Model Context Protocol](https://modelcontextprotocol.io/) is the open standard Anthropic introduced back in late 2024 for how AI agents talk to tools and data. An MCP server hands out a set of tools (functions an agent can call with structured arguments) over JSON-RPC. An agent asks what's available by calling `tools/list`, then runs the ones it wants with `tools/call`.

The bit that matters for monitoring is that it's a session, not a one-shot request. Before any of the tool stuff happens, the client and server do an `initialize` handshake and a `notifications/initialized` step. So your server can happily answer 200 on every HTTP request and still be completely incapable of getting through a real MCP session. That gap is the whole story.

## Why "200 OK" quietly lies to you

Picture a normal setup. Containerised MCP server, load balancer in front, auth proxy in front of that. You point an uptime check at it, it returns 200, you get a green dot. Here's what that green dot is not telling you:

1. **The handshake is broken.** The server comes back with a `protocolVersion` it shouldn't, or it drops `serverInfo`. Any agent that's strict about version negotiation bails before it ever lists a tool. Your check still sees 200.

2. **A tool went missing.** Someone shipped a deploy that rolled back a feature flag, and now `tools/list` is one tool shorter. The exact tool your pipeline depends on just isn't there anymore. Still 200.

3. **A schema drifted.** A tool's input schema changed. An argument got renamed, a required field got added, a description got reworded. Every agent that cached the old shape is now sending requests the server rejects. Still 200.

4. **A tool got slow.** It still works, but it takes 12 seconds instead of 800ms because something upstream is having a bad day. Every agent call now blocks for 12 seconds and your throughput quietly falls off a cliff. Very much still 200.

5. **A tool returns garbage.** The response is shaped correctly but the content is wrong. Your database tool starts returning nothing because a migration dropped a table. Valid JSON, valid 200, no actual data.

Not one of those shows up in uptime monitoring. And they won't show up as an error spike in your app traces either, unless your agent code goes out of its way to catch and record them. They just sit there, breaking things, looking healthy.

## So what does monitoring one actually look like

This is the part people don't always realise is even an option. You don't have to settle for knocking on the front door and leaving. You can run the actual MCP session on a schedule and check it at every step, the same way a real client would.

In Yorker that's a check type, and it reads like this:

```yaml
# yorker.config.yaml
monitors:
  - name: "customer-data MCP server"
    type: mcp
    endpoint: https://mcp.internal.example.com/mcp
    frequency: 5m
    locations: [us-east, eu-west]
    timeoutMs: 30000
    detectSchemaDrift: true
    auth:
      type: bearer
      token: {{secrets.MCP_PROBE_TOKEN}}
    expectedTools:
      - get_customer
      - list_orders
      - update_address
    testCalls:
      - toolName: get_customer
        arguments:
          customer_id: "test-probe-001"
        expectedOutputContains: "probe"
```

That check goes red if any of this happens:

- the `initialize` handshake doesn't finish inside the timeout
- a tool in your `expectedTools` list isn't in the `tools/list` response
- a tool's input schema changed since the last good run
- the `get_customer` call comes back without "probe" in it
- any phase blows past the timeout

So instead of "the server answered," you're checking "the server did the thing my agents need it to do." That's a much bigger difference than it sounds.

![The MCP session lifecycle: initialize, notifications/initialized, tools/list, and tools/call, each one validated, each with its own timing.](/blog/mcp-server-monitoring/02-mcp-session-lifecycle.svg)

## The sneaky one is schema drift

Of all of those, schema drift is the one I'd lose sleep over, because it's the most invisible. The server's up, the status is 200, the tool is right there in the list. The only thing that moved is the contract your agents were written against, and nothing in normal monitoring is looking at that.

Here's how we catch it. Every run, we hash each tool's input schema using a key-sorted representation (so the same schema always hashes the same, regardless of key order), and compare it to the previous run's hash. Anything that moved gets flagged as added, removed, or modified, with the tool name attached. Because it runs every cycle, you hear about a schema change within minutes of it happening, not two days later when an agent starts doing something weird.

![Schema-drift detection. This run's per-tool hashes get diffed against the previous run, and summarize shows up as added, search_docs as modified, and legacy_lookup as removed.](/blog/mcp-server-monitoring/03-schema-drift-detection.svg)

You also get per-phase timing on `initialize`, `tools/list`, and `tools/call`. That last number is the one to keep an eye on. Tool execution time is what your agents actually feel, and a slow tool is a slow agent even when nothing is technically failing.

## How is this different from normal API monitoring?

Fair question, since on the surface it's "POST some JSON, check the response." The difference is the discovery layer. With a normal API you assert against a known URL and a known response shape. With MCP, the interesting stuff isn't fixed. The server declares its tools at runtime and your agents use whatever `tools/list` hands back. They're not calling a URL you can pin down, they're calling whatever the server says exists right now.

So if you're not watching `tools/list` itself, you're watching the plumbing but not the contract. Monitoring MCP properly means being protocol-aware: checking the handshake, the discovery response, and the tool behaviour together, because together is how your agents actually use it.

## It lands in the observability stack you already have

I really didn't want this to be one more dashboard you have to remember exists. So MCP checks emit standard OTLP into your own backend, exactly like the HTTP and browser checks do. Status, response time, the tools we found, the drift we computed, per-phase timing, all of it flows into ClickStack, Grafana, Honeycomb, or whatever you're running.

There's also a W3C `traceparent` going out with the `initialize` request. If your MCP server passes trace context through to its own downstream calls (the database, the vector store, whatever it leans on), then the check and the spans it sets off all land in one distributed trace. A slow `tools/call` that traces straight back to a slow query becomes something you can see in a single view, instead of two separate investigations at 2am.

![MCP checks emit standard OTLP (status, per-phase timing, the discovered tools, schema drift, and a W3C traceparent) into your own OTel backend.](/blog/mcp-server-monitoring/04-mcp-otlp-into-your-backend.svg)

## What it looks like in Yorker

MCP is a first-class, generally available check type. You set one up the same way you'd set up an HTTP or browser check (in the UI, in plain English, or in `yorker.config.yaml`), and it runs the whole session every cycle: handshake, discovery, schema-drift detection, tool-call assertions. Same ephemeral, tenant-isolated runners as everything else, same 14 regions, same OTLP going out.

Here is one of our own MCP monitors in the dashboard. The Tools tab lists every tool the server is currently exposing, the hash of each tool's input schema, and whether anything has drifted since the last run.

![The Tools tab of an MCP monitor in Yorker, showing four discovered tools with their input-schema hashes and a schema-drift panel reporting no changes this run.](/blog/mcp-server-monitoring/05-mcp-monitor-tools.png)

As far as I can tell, nobody else in synthetic monitoring is checking MCP servers at the protocol level yet. Which is a little mad to me, given how quickly these things turned into real infrastructure.

## Free to start, unlimited when you're ready

MCP checks are on the free tier: 10,000 HTTP and MCP checks a month, one location, no card required. That's plenty to put a real MCP server under proper watch and find out what you've been missing.

If you want to go all in, the paid plan is $29.99 a month and MCP monitors are unlimited on it. Run as many as you have servers, check them as often as every minute, from all 14 locations or a private one sitting inside your own network. No monthly check cap. Schema-drift detection, tool-call assertions, and OTLP are all just on, not locked away behind some enterprise tier.

## A few things people ask me

**Can't I just write an HTTP check that POSTs a `tools/list`?**

You can, and it beats nothing. The catch is what you're able to assert. Checking the response is valid JSON isn't the same as checking a specific tool is present and its schema hasn't moved. Drift detection especially needs state: you have to compare hashes between runs, and a one-off HTTP check doesn't remember anything from last time. That inter-run comparison is the bit a proper MCP check handles for you.

**Do I need an agent running for this?**

Nope. The check plays the part of an agent itself. It does the handshake, the discovery, and the tool calls with direct JSON-RPC, so the server sees something that looks like a real agent session. There's no agent framework sitting on the runner.

**What about auth?**

MCP servers authenticate over HTTP, so the check takes Basic, Bearer, or API key, same as our HTTP checks. The credential is pulled from a secrets store by name, never written into your config file.

**What if I change a schema on purpose?**

Drift alerts fire on any change, intended or not. When you ship a real update, you acknowledge the drift in the UI (or the API), that new hash becomes the baseline, and you go back to only hearing about changes you didn't expect. The acknowledgement is logged against the run, so there's a trail of who knew what and when.

---

MCP servers quietly became production infrastructure. It's worth watching them like the rest of your production infrastructure. If you want to see what that actually feels like, the free tier is right there, no card needed.

[Start free →](/sign-up) and [here's how the MCP checks work →](/features/mcp-monitoring)