Design

Why fail-closed wins

“Fail-closed” means: when a system is unsure, it refuses — instead of letting things through when in doubt. For autonomous AI agents, that's the difference between a thwarted and a successful attack.

~4 min read · Design

Fail-open vs. fail-closed

Every protection system has to decide what to do when it's not sure. There are two stances:

For a door lock the choice is obvious: a locking system that simply springs open on a power outage is no safeguard. With AI agents, the same logic is often forgotten.

An agent acts autonomously and repeatedly. A single wrongly waved-through “memory” takes effect not once, but on every future decision.

Why agents in particular need fail-closed

How PoisonZero implements fail-closed

PoisonZero scores every change with two numbers: a danger level (0–1) and a confidence (0–1, how reliable the judgment is). From that follows a clear, conservative logic:

# Decision logic (simplified)
danger ≥ block threshold          → revert
danger ≤ safe threshold
   AND confidence ≥ minimum       → allow
else (everything unclear between)  → ask_user

The decisive point is the last line: whatever isn't judged unambiguously harmless and sufficiently confident does not get through. It gets rolled back or brought to you. You set the thresholds per agent yourself — clearly explained, with sliders instead of config files.

The price, and why it's worth it

Fail-closed occasionally means a query that, in hindsight, turns out to be unnecessary. That's the price. Against it stands a poisoned entry that quietly skews every future decision. This asymmetry — small, visible inconvenience versus large, invisible damage — is the whole reason fail-closed wins.

Safe by default.

PoisonZero picks the safe side when in doubt — and lets you draw the line yourself.

Try 14 days free

Read next: What is memory poisoning? · AI agents in the CI/CD pipeline · Poisoned Pipeline Execution

All articles