
I spend most of my time thinking like a criminal.
Not because I’m edgy, but because that’s literally the job. And lately, everywhere I look, I see the same thing:
People are exposing MCP endpoints like they’re REST APIs, and forgetting they’re actually money execution engines.
So let’s talk about Token Torching. Yes, I invented another name.
This isn’t data theft. It’s not taking your service down.
It’s quietly, methodically, and legitimately making your AI system cost so much that someone disables it.
This is the kind of attack no one models because:
- nothing crashes,
- nothing looks “malicious,”
- and the requests are all technically valid.
Which is exactly why it works.
MCP changed the threat model (and most teams missed it)
Traditional abuse assumes I want your data, your uptime, or your credentials.
With MCP, I don’t need any of that.
I just need you to:
- accept an external request,
- do “helpful AI things,”
- and pay for them.
That’s it.
Every MCP-enabled system is now:
- externally triggerable,
- internally paid,
- and often cost-amplifying by design.
If you don’t believe me, keep reading.
Two ways Token Torching shows up in the real world
I’ve seen both patterns. Neither is hypothetical.
Pattern A: “We pay for the model.”
This one’s obvious.
An external request flows through your MCP, your LLM, then your bill.
If I can trigger
- long reasoning,
- multi-step planning,
- retries,
- retrieval,
- tool calls,
I don’t need volume. I need complexity.
Pattern B: “Bring your own key (but we still pay).”
This is where teams get smug and wrong.
Sure, the caller brings their own model key.
But you still pay for:
- embeddings,
- vector search,
- reranking,
- orchestration,
- downstream SaaS APIs,
- retries,
- workflow execution.
Congrats. I outsourced 20% of the bill and kept the other 80%.
How an attacker thinks about this (at a high level)
No step-by-step exploitation. Just mindset.
When I look at a public or semi-public MCP surface, I ask:
“Where does cost amplify?”
- One request that leads to many agent steps
- One request that triggers many tool calls
- One request that leads to a large retrieval scope
“What retries automatically?”
- Tool failures
- Schema mismatches
- Partial successes
- Timeouts
Retries are just polite token burners.
“What looks reasonable but is worst-case?”
- Broad semantic queries
- High top-k retrieval
- Large structured outputs
- Inputs that sit right on validation boundaries
Nothing illegal. Nothing malformed. Just expensive.
“What keeps running if I walk away?”
Streaming responses.
Background tasks.
Async workflows.
If generation continues after disconnect, that’s not resilience — that’s a billing leak.
How to test your own system like an adult
If you run MCP in production, your security team should explicitly test the following.
1. Cost-per-request testing
Pick a single identity and ask:
- What’s the maximum cost of one valid request?
- How many tokens?
- How many tool calls?
- How many retries?
If you don’t know, that’s already a finding.
2. Complexity skew testing
Compare a “normal” user request vs a valid but pathological one.
If the cost delta is 10x, 50x, or “uhhh wow” — congratulations, you found your torch.
3. Retry abuse testing
Intentionally induce near-schema failures, slow tools, and partial tool errors.
Then watch how many retries fire, how much they cost, and whether there’s a hard stop.
Hope is not a control.
4. Retrieval blast radius testing
Test:
- max top-k
- cross-namespace queries
- ambiguous semantic searches
If one request fans out across half your vector store, that’s not “powerful AI.” That’s an unbounded cost surface.
5. Disconnect behavior
Start a request.
Disconnect early.
Watch billing.
If the system keeps thinking after you leave, you’re paying for ghosts.
What defenders should be logging (and probably aren’t)
If I were attacking this quietly, these are the signals I’d try to stay just under.
Which means these are exactly what you should alert on.
- cost per request (not just RPS)
- tool calls per request
- agent step count
- retries per request
- retrieval scope metrics
- spend per identity / key / IP
- endpoints ranked by cost, not traffic
If Finance sees this before Security does, you’ve already lost the argument.
Controls that actually stop Token Torching
Not vibes. Not “we’ll watch it.”
Hard budgets
- per request
- per identity
- per tool
- per tenant
No budget? No execution.
Cheap gates before expensive brains
Auth, validation, size limits, retrieval caps — before the LLM ever wakes up.
Progressive trust
Public MCPs should start weak.
Power is earned, not exposed.
Per-tool quotas
Some tools should never be callable from untrusted MCP traffic.
That’s not restrictive, that’s sane.
Kill switches
If you can’t shut off an expensive tool in seconds, you don’t control your system.
Control Planes
I wrote a blog about this that you should read to learn more: MCP vs MoCoP
Final thought
MCP didn’t just make AI more capable. It made “cost” an attack surface. Talk about security as a business enablement tool.
Token Torching isn’t hypothetical.
If you expose MCP publicly and don’t test for this, you’ve built a very polite way for someone else to light your money on fire.
Curious to see how Sumo Logic protects your AI systems? Sign up for our 30-day free trial.



