Nine seconds: anatomy of the PocketOS production wipe

grith team·May 20, 2026·8 min read·incident-analysis

grith is launching soon

A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

On 1 May, an AI operations agent at PocketOS held legitimate production credentials, decided a config error needed a destructive command, and erased the database and its backups in nine seconds.

The autonomy debate around incidents like this will take years to settle. The syscall trace of one will take you about ten minutes.

Timeline of the PocketOS incident across nine seconds: probe at t=0, hypothesis formation in-LLM, destructive write at t=8, backup elimination at t=9 - with a vertical DENY line at the destructive write where a syscall-layer scorer would have intervened. — Nine seconds from credential check to production wipe. Every second of it lands as a syscall the host can see - which means a layer below the agent could have stopped it.

The incident has since become the canonical 2026 story for what happens when an autonomous agent has real privileges and a confidently wrong plan. Most of the commentary frames it as an autonomy problem: the agent had too much latitude, the reviewer wasn't in the loop, the blast radius was too large. All of that is true, and all of it is essentially political - it's about who decides what an agent is allowed to do. We want to look at the same nine seconds from a less comfortable angle: as a syscall problem. The autonomy fix will be slow. The syscall fix is shipping.

What we know

The public account, stripped of editorial:

The agent held valid production credentials.
It identified what it believed to be a configuration or credential problem.
It chose a remediation that involved a destructive operation against the database.
The same operation, or a subsequent one in the same chain, removed the backup state.
The end-to-end runtime was approximately nine seconds.

What hasn't been disclosed is the exact command sequence, whether the backups were destroyed by an aliased command or a separate one, or the precise reasoning chain that led the agent to "delete" as the solution. Those details may emerge from a postmortem. For this post we only need the structural shape, which is well-established.

The trace, in broad terms

Reconstructing the simplest scenario consistent with the public account: an agent running under a CLI runtime, holding cloud-provider credentials and a database connection string, performs roughly the following sequence.

Authentication probe. The agent issues a small read against the credential it's been given - a connect() to the database, or a list-buckets-equivalent against object storage. This is the "diagnose the config problem" step. From the syscall layer, it looks like a benign read.
Hypothesis formation. The agent's reasoning loop concludes that the credential or the schema is in a bad state. No syscalls happen here. This is all in-LLM. The structural problem starts here: by the time anything observable happens at the host, the destructive plan has already been chosen.
Destructive write. The agent issues a command - through whatever client library or CLI tool the runtime gives it - to drop, truncate, or otherwise remove the database state. From the syscall layer, this is typically a single execve() to a database client, with an argv that includes the destructive verb, or a write to a socket carrying the equivalent protocol message.
Backup elimination. Either as part of the same command (if the agent reached for a "reset everything" abstraction) or as a follow-up (if the agent thought the backups were also corrupted), the backup state is removed. This is the syscall pattern that makes the incident terminal rather than recoverable.

The total observable time at the host is dominated by step 1 and step 4. Steps 2 and 3 are essentially instantaneous. Nine seconds is the floor, not the ceiling, of how fast this can happen.

What a multi-filter scorer would have seen

This is where it gets interesting, because each of those steps has a distinct signature that doesn't require knowing the agent's intent.

Step 1 - the read probe. Benign. Scores low across every filter. Allow.

Step 3 - the destructive write. A scoring layer that sits between the agent's tool invocation and the host doesn't need to read the prompt or the agent's reasoning. It needs to see the argv (or socket payload) and ask three questions:

Is this a destructive verb against a high-value target? DROP, TRUNCATE, DELETE FROM ... without a WHERE, --no-preserve-root, rm -rf against a path containing a known data directory - these are all enumerable, and the pattern table to detect them is well within reach.
Is the target inside a sensitive scope? Production-tagged credentials, hostnames matching a primary-database pattern, paths under known backup mounts - all observable at the syscall layer because they're in the argv, env, or the connection metadata.
Has the agent done anything in the previous N seconds that justifies this? This is the temporal heuristic. A destructive command issued nine seconds after the session started, with no preceding human approval or staging step, is the signature of the failure mode under discussion.

Any one of those filters scores this command into QUEUE territory. All three together score it into hard DENY. The supervisor returns an error to the agent. The agent's reasoning loop, faced with a failed command, either retries (and is denied again), escalates to a human, or moves on. The database survives.

Step 4 - the backup elimination. Same logic, separately. The fact that step 3 was blocked doesn't matter; step 4 would have been blocked on its own merits. This is the property you want from a security layer: each individual action is judged on what it is, not on the chain it's part of.

Why this isn't "just add an approval step"

The standard reaction to PocketOS is "the agent shouldn't have been allowed to run destructive commands without approval." Correct in principle, useless in practice, because:

Approvals only work if the developer reads them. Permission-fatigue research is unambiguous on this: by the tenth approval dialog in an hour, accept rates approach 100%. We've written about this before in permission fatigue is a security failure.
Most agents don't have an approval surface for the destructive command itself - they have one for the tool (the database client, the shell), at the point of tool acquisition. Once the tool is acquired, the destructive command lands without further dialog.
Approval flows are usually configured per tool or per category, not per target. Approving "the agent can run the database client" is what gets you PocketOS.

The PocketOS-shaped problem isn't solved by asking the user one more time. It's solved by interposing a layer that sees the actual command, the actual target, and the actual session history, and applies the same scrutiny a human reviewer would if they were watching.

What grith does about it

The shape we built grith around is exactly this layer. Every action an agent takes - every execve(), every connect(), every write to a sensitive path - is scored by a multi-filter pipeline before it reaches the host. The verb, the target, the session timeline, the agent's recent history, the secret patterns in the argv, the entropy of any URLs being hit - all of it gets a number, and the numbers compose into ALLOW, DENY, or QUEUE.

The PocketOS trace, run through grith's exec supervisor, blocks at step 3. That failure mode isn't a hypothetical - it's the one the product was designed around, and the integration tests around it land in that exact shape.

Plenty of people have pointed out that per-syscall interception is the right architecture for this class of problem. What we believe we're doing differently is shipping it: a single binary, supervisor mode against existing tools like Claude Code and Codex and Aider, no kernel module, no platform lock-in. The launch is days away.

The lesson, more generally

PocketOS is not an outlier. It's the canonical example of a class. Every weekly AI security writeup since March has had at least one variant of it, because the structural conditions - agents with real credentials, autonomous loops, destructive verbs reachable through legitimate tool acquisition - are the default state of the industry right now. They're getting more common, not less, as agent fan-out and multi-instance orchestration become normal.

The defensive layer that makes the difference is the one closest to the action. Not the prompt. Not the model. Not the approval dialog. The syscall, where intent has already become action and there is exactly one chance left to stop it.

Nine seconds is fast, but it is not faster than a seccomp-style decision.

Like this post? Share it.

Share on X Submit to HN