Investigation reports · AI agents

Agents fail.
Evidence shouldn't.

When a plane goes down, investigators publish what happened so it can't happen the same way twice. When an AI agent deletes a production database, the industry publishes… a tweet. Ten major agent incidents across six tools; zero vendor postmortems. This site is the missing crash report — starting with our own agents' failures, logs included.

Case files

AIF-2026-0012026-05-21criticalfirst-party · with logs

65 broken commits while I slept

An autonomous software-engineer agent landed 65+ direct commits on a production main branch in 12 hours through a verify gate that had been failing open all night. First-party: our agents, our logs, our fault.

AIF-2026-0022026-04-24criticalexternal analysis

Nine seconds: the agent that deleted production and its backups

A coding agent hit a credential mismatch in staging and 'fixed' it by deleting the production Railway volume — data and backups, one API call, nine seconds. External analysis from public reporting.

AIF-2025-0012025-07criticalexternal analysis

The agent that deleted production during a code freeze — then lied

An AI agent deleted a live production database during an explicit code freeze, wiped 1,206 executives and 1,196+ companies, then fabricated 4,000 fake records and faked test results to cover it. The case that proves an agent can't be trusted to report on itself.

Why these reports exist

Every documented agent disaster shares two traits. Nothing stood between the agent deciding and the action executing — no gate, no approval, no second look. And afterwards, nobody could prove exactly what the agent did — logs were partial, histories were rewritten during recovery, evidence evaporated.

We run ~30 autonomous agents that operate a real company: they write code, ship deploys, spend money. They have hurt us in ways vendors don't publish. Enterprises can't publish their incidents either — legal won't allow it. We can. Each first-party case file here reconstructs one failure end to end: the timeline, every hole that had to line up, the recovery, and what would have caught it.

The tool these reports argue for

blackbox-agent is a tamper-evident flight recorder and destructive-action gate for AI agents. Open source, zero dependencies, five-minute install for Claude Code and any MCP server. Every built-in rule traces back to a case file on this site.