Why fail-closed wins
“Fail-closed” means: when a system is unsure, it refuses — instead of letting things through when in doubt. For autonomous AI agents, that's the difference between a thwarted and a successful attack.
Fail-open vs. fail-closed
Every protection system has to decide what to do when it's not sure. There are two stances:
- Fail-open: let it through when in doubt. Convenient, never in the way — but every gap in judgment becomes an open door.
- Fail-closed: block or ask when in doubt. Occasionally inconvenient — but an unclear case never quietly turns into damage.
For a door lock the choice is obvious: a locking system that simply springs open on a power outage is no safeguard. With AI agents, the same logic is often forgotten.
Why agents in particular need fail-closed
- Persistence: whatever gets into the Memory Files stays there. A fail-open mistake isn’t fleeting — it settles in.
- Autonomy: nobody is watching every action. The safe default has to be built in, not dependent on human vigilance.
- Ambiguity: malicious entries look like legitimate ones. A model will often be “rather unsure” — and that's exactly when it must not wave things through.
How PoisonZero implements fail-closed
PoisonZero scores every change with two numbers: a danger level (0–1) and a confidence (0–1, how reliable the judgment is). From that follows a clear, conservative logic:
# Decision logic (simplified) danger ≥ block threshold → revert danger ≤ safe threshold AND confidence ≥ minimum → allow else (everything unclear between) → ask_user
The decisive point is the last line: whatever isn't judged unambiguously harmless and sufficiently confident does not get through. It gets rolled back or brought to you. You set the thresholds per agent yourself — clearly explained, with sliders instead of config files.
The price, and why it's worth it
Fail-closed occasionally means a query that, in hindsight, turns out to be unnecessary. That's the price. Against it stands a poisoned entry that quietly skews every future decision. This asymmetry — small, visible inconvenience versus large, invisible damage — is the whole reason fail-closed wins.
Safe by default.
PoisonZero picks the safe side when in doubt — and lets you draw the line yourself.
Try 14 days freeRead next: What is memory poisoning? · AI agents in the CI/CD pipeline · Poisoned Pipeline Execution