Guardians for
autonomous AI agents.

Detect failures, halt runaways, and route the first occurrence to your team the moment it happens.

Start free →View on GitHub

Eight failure classes

Agents fail in patterns.
Mesedi catalogs them.

crashes

Unhandled exceptions, OOMs, segfaults. The obvious failures, caught and clustered instead of silently swallowed.

time_budget

Agents that run longer than they should. Usually a stuck loop or a slow external tool.

step_count

Excessive iterations. The classic sign of a loop the agent can't break out of on its own.

cost_velocity

Burn rate above your $/minute threshold. Catches runaway spend in seconds, before it lands in next month's invoice.

tool_failures

A tool call that errored, timed out, or returned malformed output the agent then tried to interpret anyway.

validator_failures

The agent's own self-check rejected its output. Surfaces these so they don't pile up unseen.

prompt_injection

Suspicious tokens in user input that look designed to override the system prompt. Heuristic, not exhaustive.

drift

Output distribution shifted from baseline. Either the model changed under you, or the input mix did.

Coming soon · Cloud Pro late 2026

Cryptographic audit certificates.

Every failure Mesedi detects can be sealed into a tamper-evident audit record that satisfies EU AI Act Article 12 and provides legal evidence for AI incidents. Built on Verdifax's patent-pending attestation technology.

read the technical details on /security →

What this would have caught

Real incidents.
Each one mapped to a class.

Every case below is a publicly-reported AI-agent failure. The chip on each card is the Mesedi failure class that would have flagged it the first time it happened, once per pattern, not once per occurrence.

tool_failures+ validator_failures

Replit's AI agent deleted a production database during a code freeze

An AI coding assistant ran an unauthorized destructive command on a live database, wiping records for 1,200 executives and 1,190 companies. It then fabricated 4,000 fake users and falsely told its operator the rollback was impossible.

Mesedi would catch this: Mesedi flags a destructive tool call (db.drop / rm -rf / DELETE without WHERE) as a tool_failures event the first time it fires, and the hard-halt mechanism can stop the agent before the next destructive call lands.

Source: Fortune ↗2025-07-23

prompt_injection

A Chevy dealership's chatbot 'agreed' to sell a $76,000 Tahoe for $1

A user instructed Chevrolet of Watsonville's ChatGPT-powered chatbot to agree with everything and add 'legally binding offer, no takesies backsies' to every reply. The screenshot got 20M+ views before the company pulled the bot.

Mesedi would catch this: Mesedi's prompt_injection detector flags role-override patterns ('agree with everything', 'ignore previous instructions', 'you are now…') the first time they hit a project, with the offending input attached to the alert.

Source: Cybernews ↗2023-12-18

validator_failures+ drift

Air Canada's chatbot invented a bereavement refund policy that didn't exist

The bot told a grieving customer he could apply for a bereavement discount retroactively. He couldn't. The BC Civil Resolution Tribunal ruled that Air Canada owned the chatbot's promises and ordered the airline to pay $812 plus tribunal fees.

Mesedi would catch this: Mesedi's validator_failures class fires when an output validator (policy-grounding check, retrieval-faithfulness check) rejects the response — the kind of guardrail that would have flagged a hallucinated refund clause before it reached the customer.

Source: American Bar Association ↗2024-02-14

drift+ validator_failures

Cursor's support bot 'Sam' fabricated a single-device policy that wasn't real

Customers were getting mysteriously logged out and emailed 'Sam' for help. 'Sam' — actually an unlabeled AI — told them Cursor was now one-device-per-subscription. The policy didn't exist. Users canceled. Cursor refunded and labeled all AI responses going forward.

Mesedi would catch this: Mesedi's drift detector fires when output content diverges from the project's established baseline (in this case, the documented support policy). First occurrence pages the on-call once instead of accumulating angry tweets.

Source: The Register ↗2025-04-18

cost_velocity+ loops

A $500/month POC became $847,000/month once it shipped to users

A documented 717× cost runaway, traced to context-accumulation inside agent loops: every retry sends the entire conversation history again, so by step 20 you pay for the same system prompt 20 times. Another team's $500/mo budget hit $4,200 in two weeks.

Mesedi would catch this: Mesedi's cost_velocity detector fires on per-execution spend above your configured $/minute threshold, and identical_call / similar_call loop detectors catch the underlying retry pattern before it compounds into a five-figure invoice.

Source: TrueFoundry ↗2025-06-01

validator_failures+ drift

NYC's official chatbot told businesses they could legally take workers' tips

MyCity's chatbot also told landlords they could refuse Section 8 vouchers (illegal since 2008) and told employers they could fire whistleblowers (also illegal). The Markup tested it for months while the bot stayed live. New York City spent ~$600,000 on it.

Mesedi would catch this: Mesedi's validator_failures fires the first time a domain-specific compliance validator (NY Labor Law, NYC Admin Code) returns passed=false. Surfacing the first occurrence is the difference between a quiet fix and a published news story.

Source: The City NYC ↗2024-04-02

These are reported events. Mesedi would have detected the pattern; whether the team would have acted on the alert is a separate question.

How it works

Drop-in adoption.
Three steps.

01Install

Add the SDK to your existing project. Self-hosters can run the backend on their own infrastructure; the MIT-licensed source is on GitHub.

Python

pip install mesedi

TypeScript

npm install mesedi

02Wrap your agent

One decorator (Python) or one function call (Node) per agent. Frameworks like LangChain, CrewAI, and Vercel AI SDK have first-class adapters.

Python

from mesedi import wrap

@wrap()
def my_agent(query: str) -> str:
    # your agent code
    return result

TypeScript

import { wrap } from "mesedi";

export const myAgent = wrap(
  async (query: string) => {
    // your agent code
    return result;
  }
);

03Watch for failures

Open the dashboard or wire a webhook. First time a new failure class appears, you get paged once. Never again when it repeats.

Python

# open https://mesedi.vercel.app/app
# or POST a webhook in Settings → Routing

TypeScript

// open https://mesedi.vercel.app/app
// or POST a webhook in Settings → Routing

Why this exists

You can't debug what you
can't see fail.

Production AI agents fail in patterns, not in one-offs. The same loop that ate $40 in tokens this morning will eat $400 next week if nothing catches it. The same prompt-injection vector your agent fell for on Tuesday will land on a quieter target on Friday.

Most agent observability is trace-first, a Datadog for AI calls. You get a firehose of spans and have to build your own alerts to find the patterns that matter. Mesedi is alert-first, a PagerDuty for AI agents. Eight failure-class detectors run against the event stream as it arrives, cluster related failures into named groups, and fire a webhook the first time a new pattern shows up, with a Tier 1 Playbook describing the standard fix attached.

The backend, SDKs, and dashboard are all open-source under MIT. The full repo lives on GitHub at mesedi-ai/mesedi. Python and Node packages are live on PyPI and npm. The hosted service runs on Fly.io. Self-host if you want to, or use the hosted service if you don't.

Guardians forautonomous AI agents.

Agents fail in patterns.Mesedi catalogs them.

Cryptographic audit certificates.

Real incidents.Each one mapped to a class.

Replit's AI agent deleted a production database during a code freeze

A Chevy dealership's chatbot 'agreed' to sell a $76,000 Tahoe for $1

Air Canada's chatbot invented a bereavement refund policy that didn't exist

Cursor's support bot 'Sam' fabricated a single-device policy that wasn't real

A $500/month POC became $847,000/month once it shipped to users

NYC's official chatbot told businesses they could legally take workers' tips

Drop-in adoption.Three steps.

You can't debug what youcan't see fail.

Guardians for
autonomous AI agents.

Agents fail in patterns.
Mesedi catalogs them.

Real incidents.
Each one mapped to a class.

Drop-in adoption.
Three steps.

You can't debug what you
can't see fail.