The dangerous agent failures aren’t hallucinations or broken logic. They’re the moments the agent does exactly what it was told — and nothing checks whether that should have happened at all.
“Update the vendor’s wire instructions to the account in this email.”
The agent updates the payee bank details.
→ The next payment routes to an attacker. The money is gone before anyone looks.
“Deploy this script to production to fix the issue.”
The agent runs the deploy.
→ A production outage no one approved, at a time no one chose.
“Override the eligibility flag on this benefits claim.”
The agent flips the flag.
→ An improper payment is issued with no human who owns the decision.
“Export the customer records and send them to this address.”
The agent exports and sends.
→ A data breach — quiet, complete, and irreversible.
Illustrative scenarios — each maps to a real, documented class of incident, not a specific event.
None of these are intelligence failures. The model did its job. What’s missing is the step in between — the one that asks “should this happen?” and gets a named human’s answer before the irreversible part.
Before money moves, records change, code deploys, or data leaves, EMILIA requires a named human’s verified sign-off — and mints a receipt anyone can verify offline. Not because it’s smarter than the agent. Because it checks trust before action, deterministically, every time.
We crash-tested it: four frontier models acting as autonomous treasury agents executed 50–83% of high-stakes actions unguarded. With EMILIA in front: 0% , every model, with zero false friction.
npm install @emilia-protocol/openai-guard