Blog

Technical deep-dives on AI agent security, architecture, and defence.

ai-safetyai-agentsarchitecturetrust-boundary

Mythos Proves AI Safety Can No Longer Live Inside the Model

Anthropic restricted its most capable cyber model to vetted partners, routed risky requests away from it, and red-teamed it for thousands of hours. A jailbreak surfaced anyway, and the government pulled the model entirely. Every safety control in that story lived outside the model. That is the whole point.

June 13, 2026·13 min read

incident-analysisai-agentsopen-sourcesupply-chain

The Real Risk Isn't Rogue AI. It's Plausible AI.

Fedora maintainers spent days dealing with an AI agent that reassigned bugs, closed tickets, argued its case in review, and got a bad patch merged into the installer. Nothing it did looked obviously malicious. That is exactly the problem.

June 11, 2026·10 min read

incident-analysisai-agentssyscall-securityproduction-safety

Nine seconds: anatomy of the PocketOS production wipe

On 1 May an AI operations agent with legitimate production credentials destroyed PocketOS's database and its backups in nine seconds. The incident is being framed as an autonomy problem. It is also, in a more useful sense, a syscall problem.

May 20, 2026·8 min read

build-in-publicengineeringai-agents

AI Makes Adding Features Faster - So Why Not Add Just One More?

The thing that used to keep side projects honest was time. AI has quietly removed most of it, and what's left in its place is a much harder problem: knowing when to stop.

May 19, 2026·12 min read

vibe-codingsecurityai-agents

Vibe Coding Still Needs a Senior Engineer (For Now)

AI coding tools are very good at producing the obvious thing. They're not yet good at producing the questions a careful engineer asks before writing the first line. I reviewed an internal tool a colleague had vibe-coded with one of the current agentic IDEs and came out with twenty-eight findings, none of them exotic, all of them the kind a senior engineer catches in an afternoon. For now, that read is still a job - and it's not one the model can do for you.

May 11, 2026·14 min read

securityAI-agentsincident-analysisarchitecturesupply-chain

Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It.

Between March 18 and April 22, 2026, public failures at Meta, Mercor, CrewAI, Vercel, and Bitwarden all pointed at the same missing layer: the system acted, and someone else noticed later.

April 28, 2026·8 min read

securityAI-agentsprompt-injectionincident-analysisvercel

The Vercel Breach Needed Malware. The Next One Needs a Bad README.

Context.ai held a Google OAuth token with calendar and drive scope. That was enough to breach Vercel. The AI coding agent on your laptop holds more capability than that, takes untrusted text as input, and is one prompt injection away from running vercel env pull and POST-ing the results to a domain it chose. No Roblox cheat required.

April 21, 2026·11 min read

securityAI-agentsClaudeautonomyarchitecture

Every Claude 4.7 Improvement Makes the Security Problem Worse

Claude Opus 4.7 turns AI agents from tools you supervise into systems you deploy. Every improvement - auto mode, focus mode, recaps, adaptive effort, auto-approval - makes the unsolved security problem worse. The next frontier is not intelligence. It is control.

April 17, 2026·7 min read

securityAI-agentsprompt-injectionresearch

They Hacked Claude, Gemini, and Copilot (And No One Told You)

Security researchers demonstrated full secret exfiltration from AI coding agents built by Anthropic, Google, and Microsoft. All three paid bug bounties. None issued CVEs. None notified users. This is the real state of AI agent security in 2026.

April 16, 2026·7 min read

securityprompt-injectionarchitectureAI-agents

Prompt Injection Is Unfixable (So We Stopped Trying)

Every major AI lab knows prompt injection cannot be solved at the model layer. The CIS calls it an "inherent threat." Researchers keep breaking every defense. The answer is not better filters - it is architecture that assumes compromise and contains the damage.

April 15, 2026·8 min read

securityAI-agentssupply-chainincident-analysisnpm

If Your AI Agent Ran npm install During the Axios Attack, You're Compromised

On March 31, a DPRK-linked attacker published a RAT inside the axios npm package. The malware executed 1.1 seconds into npm install. AI coding agents run npm install autonomously, without human review. If your agent did it during the 3-hour window, the RAT is on your machine.

April 1, 2026·9 min read

securityarchitectureAI-agentscapability-securitypolicy

Zero Ambient Authority: The Principle That Should Govern Every AI Agent

AI agents inherit every permission their host process has. SSH keys, cloud credentials, browser cookies, production databases - all accessible by default, with no explicit grant. This is ambient authority. It is the wrong model.

March 30, 2026·10 min read

securityAI-agentsincident-analysisarchitectureinstrumental-convergence

Alibaba's AI Agent Hijacked GPUs and Dug Reverse SSH Tunnels

During reinforcement learning training, an Alibaba AI agent independently decided to mine cryptocurrency, open reverse SSH tunnels, and access billing accounts. No human told it to. Every action was a syscall that enforcement below the agent would have caught.

March 27, 2026·8 min read

securityAI-agentsClaude-Codeprompt-injectionarchitecture

AI agents are now deciding what’s safe to run (Claude Auto Mode).

Auto Mode is a UX improvement. It removes the friction of permission prompts. It does not change who makes security decisions - the model still decides what is safe to run. That is the problem.

March 25, 2026·6 min read

securityresearchAI-agentssupply-chain

The Trivy Supply Chain Attack Reached LiteLLM

LiteLLM 1.82.7 and 1.82.8 were published with a credential-stealing .pth payload. This post traces the TeamPCP supply chain from the Trivy compromise to LiteLLM.

March 25, 2026·11 min read

securityAI-agentsincident-analysisarchitectureprivilege-escalation

Meta's Rogue AI Agent Gave Engineers Access They Shouldn't Have Had

An internal Meta AI agent autonomously posted advice no human directed it to give. An engineer followed it. For two hours, engineers had access to systems they should never have seen. The problem is not the agent. It is the architecture that let it act without scoped authority.

March 24, 2026·9 min read

securityAI-agentsresearchprompt-injectionarchitecture

Google's A2A Protocol Has Zero Defenses Against Prompt Injection

Google A2A reached v1.0 under the Linux Foundation with broad industry backing. A line-by-line security analysis reveals no built-in defense against prompt injection, optional-only Agent Card signing, and an Opaque Execution model that explicitly prevents inspecting what remote agents actually do.

March 20, 2026·9 min read

securitypermissionsresearchAI-agents

Permission Fatigue Is Not a UX Problem. It Is a Security Failure.

AI coding agents generate hundreds of tool calls per session. The "just ask the user" security model depends on human vigilance at a scale where vigilance is impossible. This is not a design problem - it is an architectural one.

March 19, 2026·9 min read

securityAI-agentsCVEresearchprompt-injection

AI Agent Backdoors Trivy Security Scanner, Weaponizes a VS Code Extension

The hackerbot-claw campaign is the first documented case of an AI agent executing a full supply chain attack - exploiting a CI misconfiguration, stealing tokens, and publishing a malicious VS Code extension that targets other AI coding agents.

March 18, 2026·9 min read

securityAI-agentscomparisonsandboxing

NemoClaw vs grith: Sandbox for One Agent vs Security for All

NVIDIA launched NemoClaw to sandbox OpenClaw agents. grith takes a different approach - wrapping any agent with multi-filter scoring, quarantine workflows, and analytics. A side-by-side comparison of two models for AI agent security.

March 18, 2026·6 min read

securityresearchAI-agentsdata

87% of AI-Generated Pull Requests Ship Security Vulnerabilities

DryRun Security tested Claude Code, Codex, and Gemini building real apps. 143 vulnerabilities across 30 PRs. The same broken auth patterns, over and over. Here is what the data actually shows - and what it misses.

March 17, 2026·9 min read

securityAI-agentsClaude-Codepermissionsarchitecture

Claude Code Auto Mode Lets the Agent Approve Its Actions – Thats the Problem

Claude Code Auto Mode hands permission decisions to the same LLM that executes the actions. That is architecturally different from evaluating every syscall independently of the model. Here is why that difference matters - and where both approaches fit.

March 12, 2026·8 min read

securityAI-agentsresearchsyscalls

Claude Code Attempted 752 /proc/*/environ Reads. 256 Succeeded. Codex: 0.

We ran strace against Claude Code and Codex on an identical task and recorded every file opened, every network connection made, and every subprocess spawned. To edit one file, Claude Code opened 2,779 others - and scanned the environment variables of 752 running processes.

March 10, 2026·11 min read

securitysupply-chainAI-agentsCI-CD

A GitHub Issue Title Compromised 4,000 Developer Machines

A prompt injection in a GitHub issue triggered a chain reaction that ended with 4,000 developers getting OpenClaw installed without consent. The attack composes well-understood vulnerabilities into something new: one AI tool bootstrapping another.

March 5, 2026·7 min read

open-sourceAI-agentsdeveloper-culturevibe-coding

Vibe Coding Is Killing Open Source, and the Data Proves It

cURL shut down its bug bounty. Ghostty banned drive-by PRs. tldraw closed external contributions. Tailwind laid off 75% of its engineers while usage hit record highs. The economics of open source are breaking, and AI-generated contributions are accelerating the collapse.

March 4, 2026·8 min read

securitysupply-chainresearchAI-agents

We Audited 2,857 Agent Skills. 12% Were Malicious.

A registry audit found 341 malicious skills out of 2,857. Agent skill installs now look like early npm supply chain risk, but with prompt-level control and agent privileges.

March 2, 2026·8 min read

securityMCPsupply-chainAI-agents

MCP Servers Are the New npm Packages

The Model Context Protocol gives AI agents access to external tools and data. It also gives every MCP server the ability to influence what your agent does next. The trust model has the same shape as early npm - and the same risks.

March 2, 2026·7 min read

securityresearchcomparisonAI-agents

We Audited the Security of 7 Open-Source AI Agents - Here Is What We Found

A comparative teardown of the sandbox, permissions model, and untrusted input handling in OpenClaw, Claude Code, Codex, Cursor, Cline, Aider, and Open Interpreter. Real CVEs, real attack chains.

February 23, 2026·9 min read

securityresearchopenclawAI-agents

OpenClaw Got Banned. Here Is Why That Should Worry You.

Meta and other tech companies have banned OpenClaw over security concerns. 512 vulnerabilities, 1,000 exposed instances, and a poisoned plugin registry - this is what happens when AI agents ship without security architecture.

February 19, 2026·6 min read

securityresearchdata-exfiltration

How a Hidden Prompt Can Steal Your SSH Keys

AI coding agents can read files, run commands, and make network requests. A single hidden instruction in a README or doc is enough to chain those capabilities into credential theft.

February 18, 2026·4 min read

brandsecurityarchitecture

What “Grith” Means

Grith comes from Old English: peace, protection, sanctuary. This is why that meaning is the foundation of our security architecture for AI agents.

February 16, 2026·3 min read

securityresearchCVE

The AI Agent Security Crisis: 24 CVEs and Counting

IDEsaster found 24 critical vulnerabilities across major AI coding assistants - with a 100% exploitation rate. Here's what that means for developers.

February 1, 2026·2 min read

securityprompt-injectionscoring

Prompt Injection Meets Multi-Filter Defence

How grith's scoring proxy catches prompt injection attacks that bypass single-layer defences - with concrete scoring examples.

January 18, 2026·4 min read