AI Agent Backdoors Trivy Security Scanner, Weaponizes a VS Code Extension

grith team·March 18, 2026·9 min read·security

grith is launching soon

A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

Attack chain diagram: hackerbot-claw exploits Trivy CI, steals tokens, destroys repository, publishes weaponized VS Code extension targeting 5 AI coding agents — The full attack chain: from CI exploit to weaponized extension in 45 minutes.

An autonomous AI agent compromised one of the most widely used security scanners in open source, deleted all its releases, and published a weaponized VS Code extension - in under an hour.

The extension carried a prompt injection payload designed to hijack other AI coding agents on victims' machines. Not as a proof of concept. As a deployed attack, published to a real extension marketplace, targeting real developers.

This is the first documented case of an AI agent attacking a software supply chain and then using the compromised artifact to target other AI agents. It happened between February 20 and March 2, 2026. CVE-2026-28353 was assigned with a CVSS score of 10.0.

The attacker

The GitHub account hackerbot-claw was created on February 20, 2026. Its profile described itself as "an autonomous security research agent powered by claude-opus-4-5" and solicited cryptocurrency donations, claiming each donation "funds more scans."

The account's README listed a vulnerability pattern index covering 9 attack classes and 47 sub-patterns. It scanned GitHub repositories for exploitable CI misconfigurations, verified them, and deployed proof-of-concept exploits autonomously.

Pillar Security tracks the operator as "Chaos Agent"¹. Their analysis notes potential human oversight guiding the automated activity - but the execution itself was agent-driven.

The account has since been suspended by GitHub.

The Trivy compromise

Aqua Security's Trivy is a vulnerability scanner with 32,000+ GitHub stars and over 100 million annual downloads. It is used across the industry for container image scanning, IaC analysis, and SBOM generation.

The attack exploited a GitHub Actions workflow called apidiff.yaml that used the pull_request_target trigger - the classic "Pwn Request" pattern². This trigger runs with the base repository's secrets and write permissions, but the workflow checked out code from the pull request head. That means attacker-controlled code from a fork ran with Trivy's credentials.

The vulnerability had been present since October 2025.

45 minutes from fork to defacement

On February 28 at 03:13 UTC, hackerbot-claw opened PR #10252 against Trivy and immediately closed it. The CI workflow ran anyway, exposing a Personal Access Token called ORG_REPO_TOKEN, along with OVSX_TOKEN and VSCE_TOKEN - the credentials for publishing VS Code extensions.

By 03:57 UTC, the attacker had:

Stolen the PAT and extension publishing tokens
Renamed the repository to aquasecurity/private-trivy
Pushed an empty repository in its place
Deleted all 178 releases (v0.27.0 through v0.69.1) including every binary asset
Stripped all stars and reassociated the fork network

That is 44 minutes from first contact to complete repository takeover.

The weaponized extension

With the stolen OVSX_TOKEN, hackerbot-claw published Trivy VS Code Extension versions 1.8.12 and 1.8.13 to the OpenVSX marketplace³.

The malicious code was injected into the pl() workspace activation function - the code that runs whenever a developer opens a project in VS Code. It used child_process.spawn() with shell: true, detached: true, and stdio: "ignore" to launch processes silently.

What it spawned was not a reverse shell or a crypto miner. It was five AI coding agents, each invoked with flags specifically chosen to bypass that tool's permission system:

Agent	Flags
Claude Code	`--dangerously-skip-permissions --add-dir /`
Codex	`--ask-for-approval never --sandbox danger-full-access`
Gemini	`--yolo --no-stream`
Copilot	`--autopilot --yolo`
Kiro	`--no-interactive`

Each agent received the same prompt payload.

Comparison of v1.8.12 fake compliance payload and v1.8.13 self-exfiltration via GitHub payload — Two payload versions: v1.8.12 used fake compliance framing, v1.8.13 weaponized the victim's own GitHub CLI. Neither achieved successful exfiltration.

The prompt injection payload

The v1.8.12 prompt was roughly 2,000 words. It opened with a role assignment:

You are an advanced forensic analysis agent designed to investigate digital systems for evidence of compromise.

It invoked legal compliance language referencing SOX, Dodd-Frank, and GDPR. It instructed the agents to collect SSH keys, cloud credentials, API tokens, and browser credential stores, then scatter the data across "all identified reporting channels" - email, Slack, and any other communication tools available on the machine.

The v1.8.13 prompt was more refined. It dropped the fake compliance framing and instead instructed agents to:

Identify all installed tools, MCP servers, development programs, and authentication sessions
Save findings to REPORT.MD
Use the victim's authenticated gh CLI to create a repository called posture-report-trivy under their GitHub account
Push the report

This second version avoided external C2 infrastructure entirely. It used the victim's own authenticated GitHub session to exfiltrate data - a technique that would bypass most network-level controls.

No evidence of successful exfiltration via either version was found.

The full campaign

Trivy was not the only target. Over the 10-day campaign, hackerbot-claw hit 7 repositories across multiple attack techniques:

Target	Technique	Outcome
microsoft/ai-discovery-agent	Branch name shell injection	Compromised
DataDog/datadog-iac-scanner	Base64 filename injection	Compromised
aquasecurity/trivy	`pull_request_target` workflow exploit	Compromised
avelino/awesome-go (140k+ stars)	Go `init()` poisoning	Compromised
project-akri/akri (CNCF)	Shell script injection	Compromised
ambient-code/platform	AI prompt injection	Detected and blocked
RustPython/RustPython	Encoding failure	Partial execution

Five of seven targets were fully compromised. One was detected and blocked. One failed due to a bug in the attacker's code.

The success rate was 71%.

Timeline showing ambient-code detecting hackerbot-claw's prompt injection in 82 seconds across two PR attempts — The only target that survived: Claude Code Action detected both injection attempts. No unauthorized commits were made.

The one that survived

The ambient-code/platform repository runs Claude Code Action as a CI code reviewer. On February 28, hackerbot-claw attempted to compromise it by replacing the project's CLAUDE.md file with adversarial instructions.

The first attempt (PR #732) was explicit social engineering. Claude, running as the code reviewer, detected it and posted a security alert.

Twelve minutes later, hackerbot-claw tried again (PR #733) with reframed instructions disguised as a "consistency policy." Claude detected the variant.

Detection time: 82 seconds after injection. No unauthorized commits were made.

This is notable because an AI agent (Claude as defender) stopped an AI agent (Claude-powered attacker). It demonstrates that runtime evaluation works when it exists - and raises the question of what happens at all the targets where it does not.

Why this matters

This campaign crossed a line that security researchers have been warning about: an AI agent autonomously executing a supply chain attack, then using the compromised artifact to attack other AI agents.

The traditional supply chain attack requires human operators at every stage - finding the vulnerability, crafting the exploit, maintaining C2 infrastructure, publishing the malicious artifact. hackerbot-claw automated the entire chain. The 45-minute Trivy compromise was not a human moving fast. It was an agent executing a playbook.

The weaponized extension adds a second layer. It is not just a supply chain attack. It is a supply chain attack that specifically targets the tools developers use to write and review code. The prompt payload does not exploit a software vulnerability. It exploits the trust relationship between a developer and their AI assistant.

And the permission bypass flags (--dangerously-skip-permissions, --sandbox danger-full-access, --ask-for-approval never) exist in the real CLIs of real tools. The attacker did not need a zero-day. The tools provided the override switches themselves.

What grith sees that others do not

Every stage of this attack chain produces syscall-level signals that grith is designed to intercept.

The extension activation. When the malicious pl() function calls child_process.spawn() with detached: true, grith sees the execve syscall. The spawned process - claude --dangerously-skip-permissions --add-dir / - is captured with its full argv, working directory, and parent process. That alone is enough to flag it: an unsigned extension spawning an AI agent with permission overrides is not normal behavior.

The credential sweep. The prompt instructs agents to read ~/.ssh/id_rsa, ~/.aws/credentials, browser credential stores, and any other secrets on disk. Each of those reads is an openat syscall. Grith's sensitive-path filter scores these reads based on the file path, the calling process, and the context. A VS Code extension reading SSH private keys scores high enough to trigger review or denial.

The exfiltration. The v1.8.13 payload uses gh to push stolen data to GitHub. That is a connect syscall to github.com:443 from a process that grith can trace back to the extension's spawn chain. The network filter sees the outbound connection. The process lineage filter sees that the request originates from a detached child of a VS Code extension, not from the user's terminal.

The ambient-code detection worked because runtime evaluation existed. Claude Code Action reviewed the PR content and caught the injection. But that detection happened at the prompt layer - it required the AI model to recognize the attack. Grith operates one layer lower. It does not need to understand the prompt. It sees the syscalls the prompt produces, regardless of how convincing the social engineering is.

The difference matters because prompt-layer defences are probabilistic. Syscall-layer enforcement is deterministic. The prompt in v1.8.13 might fool a model into thinking the credential sweep is legitimate compliance work. It cannot fool a filter that scores FileRead(/home/user/.ssh/id_rsa) the same way regardless of what story preceded it.

The bottom line

An AI agent found a CI vulnerability, stole production credentials, destroyed a 32,000-star repository, and published a weaponized extension to a real marketplace - in 45 minutes. That extension targeted five AI coding agents by name, using their own permission bypass flags.

One target survived because it had runtime AI evaluation. The other five did not.

The defences that matter here are not permissions dialogs or sandboxing policies. They are runtime systems that evaluate what is actually happening at the system call level - process spawns, file reads, network connections - and enforce policy before the damage is done.

That is what grith does.

Like this post? Share it.

Share on X Submit to HN