I spend most of my time thinking like a criminal.

Not because I’m edgy, but because that’s literally the job. And lately, everywhere I look, I see the same thing:

People are exposing MCP endpoints like they’re REST APIs, and forgetting they’re actually money execution engines.

So let’s talk about Token Torching. Yes, I invented another name.

This isn’t data theft. It’s not taking your service down.

It’s quietly, methodically, and legitimately making your AI system cost so much that someone disables it.

This is the kind of attack no one models because:

nothing crashes,
nothing looks “malicious,”
and the requests are all technically valid.

Which is exactly why it works.

MCP changed the threat model (and most teams missed it)

Traditional abuse assumes I want your data, your uptime, or your credentials.

With MCP, I don’t need any of that.

I just need you to:

accept an external request,
do “helpful AI things,”
and pay for them.

That’s it.

Every MCP-enabled system is now:

externally triggerable,
internally paid,
and often cost-amplifying by design.

If you don’t believe me, keep reading.

Two ways Token Torching shows up in the real world

I’ve seen both patterns. Neither is hypothetical.

Pattern A: “We pay for the model.”

This one’s obvious.

An external request flows through your MCP, your LLM, then your bill.

If I can trigger

long reasoning,
multi-step planning,
retries,
retrieval,
tool calls,

I don’t need volume. I need complexity.

Pattern B: “Bring your own key (but we still pay).”

This is where teams get smug and wrong.

Sure, the caller brings their own model key.

But you still pay for:

embeddings,
vector search,
reranking,
orchestration,
downstream SaaS APIs,
retries,
workflow execution.

Congrats. I outsourced 20% of the bill and kept the other 80%.

How an attacker thinks about this (at a high level)

No step-by-step exploitation. Just mindset.

When I look at a public or semi-public MCP surface, I ask:

“Where does cost amplify?”

One request that leads to many agent steps
One request that triggers many tool calls
One request that leads to a large retrieval scope

“What retries automatically?”

Tool failures
Schema mismatches
Partial successes
Timeouts

Retries are just polite token burners.

“What looks reasonable but is worst-case?”

Broad semantic queries
High top-k retrieval
Large structured outputs
Inputs that sit right on validation boundaries

Nothing illegal. Nothing malformed. Just expensive.

“What keeps running if I walk away?”

Streaming responses.

Background tasks.

Async workflows.

If generation continues after disconnect, that’s not resilience — that’s a billing leak.

How to test your own system like an adult

If you run MCP in production, your security team should explicitly test the following.

1. Cost-per-request testing

Pick a single identity and ask:

What’s the maximum cost of one valid request?
How many tokens?
How many tool calls?
How many retries?

If you don’t know, that’s already a finding.

2. Complexity skew testing

Compare a “normal” user request vs a valid but pathological one.

If the cost delta is 10x, 50x, or “uhhh wow” — congratulations, you found your torch.

3. Retry abuse testing

Intentionally induce near-schema failures, slow tools, and partial tool errors.

Then watch how many retries fire, how much they cost, and whether there’s a hard stop.

Hope is not a control.

4. Retrieval blast radius testing

Test:

max top-k
cross-namespace queries
ambiguous semantic searches

If one request fans out across half your vector store, that’s not “powerful AI.” That’s an unbounded cost surface.

5. Disconnect behavior

Start a request.

Disconnect early.

Watch billing.

If the system keeps thinking after you leave, you’re paying for ghosts.

What defenders should be logging (and probably aren’t)

If I were attacking this quietly, these are the signals I’d try to stay just under.

Which means these are exactly what you should alert on.

cost per request (not just RPS)
tool calls per request
agent step count
retries per request
retrieval scope metrics
spend per identity / key / IP
endpoints ranked by cost, not traffic

If Finance sees this before Security does, you’ve already lost the argument.

Controls that actually stop Token Torching

Not vibes. Not “we’ll watch it.”

Hard budgets

per request
per identity
per tool
per tenant

No budget? No execution.

Cheap gates before expensive brains

Auth, validation, size limits, retrieval caps — before the LLM ever wakes up.

Progressive trust

Public MCPs should start weak.

Power is earned, not exposed.

Per-tool quotas

Some tools should never be callable from untrusted MCP traffic.

That’s not restrictive, that’s sane.

Kill switches

If you can’t shut off an expensive tool in seconds, you don’t control your system.

Control Planes

I wrote a blog about this that you should read to learn more: MCP vs MoCoP

Final thought

MCP didn’t just make AI more capable. It made “cost” an attack surface. Talk about security as a business enablement tool.

Token Torching isn’t hypothetical.

If you expose MCP publicly and don’t test for this, you’ve built a very polite way for someone else to light your money on fire.

Curious to see how Sumo Logic protects your AI systems? Sign up for our 30-day free trial.

BY SECURITY USE CASE

BY OBSERVABILITY USE CASE

BY INDUSTRY

BY COMPETITION

LEARN

ENGAGE

TRAIN

COMMUNITY