Every Claude 4.7 Improvement Makes the Security Problem Worse

grith team·April 17, 2026·7 min read·security

grith is launching soon

A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

Claude Opus 4.7 is the first release where unattended AI agent runs are a real workflow. Every feature that gets it there - auto mode, focus mode, recaps, effort levels, auto-approval - makes the unsolved security problem worse. More power, less oversight, same attack surface. The judge and the defendant are still the same process.

What actually changed in 4.7

The headline features are not incremental. They change how developers interact with the model.

Auto Mode. Claude can now run long, complex tasks without constant permission prompts. Refactor a codebase. Execute multi-step workflows. Iterate until benchmarks are hit. "Run it and come back later" actually works.

Focus Mode. Intermediate steps are hidden. You see the final result. You are not verifying how the work was done. You are trusting that it was done correctly.

Recaps. Agents leave behind summaries of what they did and what is next. Small feature, large consequence - it turns the model from a synchronous tool into an asynchronous collaborator.

Effort levels. Instead of explicit thinking budgets, Claude uses adaptive effort. Lower effort runs faster and cheaper. Higher effort reasons deeper. Developers are no longer writing prompts. They are managing cognition.

Fewer permission prompts. A classifier learns which commands are safe and auto-approves them. The industry is actively trying to remove humans from the loop¹.

The real shift

Put these together and a pattern emerges.

From chat to execution.
From synchronous to asynchronous.
From supervised to delegated.

You are no longer using AI. You are deploying it.

The problem nobody is solving

Here is the uncomfortable truth: every improvement in 4.7 makes the core security problem worse.

Less oversight. Auto mode removes supervision. Focus mode hides execution. Recaps arrive after the fact.

More power. Agents run longer, access more tools, chain more actions.

Same attack surface. Prompt injection still works. Data exfiltration is still possible. Malicious repositories still execute code when the agent reads them².

No human in the loop. Before, humans clicked "approve" blindly. Now models approve on your behalf. Same problem, one layer deeper.

AI agents already operate in a dangerous environment. They read untrusted code. They have filesystem access. They execute shell commands. They call external APIs. That combination is fundamentally unstable. Remove human oversight and you do not eliminate risk. You amplify it.

Capability versus control

Claude 4.7 optimises for capability. grith is built for control.

Today's agents follow a flawed model: give the AI broad access, then try to restrict bad actions. grith flips the order. Start with zero access. Evaluate every action before it happens.

Claude:  Model → Tool → System
grith:   Model → Security Proxy → System

Two side-by-side architecture diagrams. The Claude flow goes Model to Tool to System with nothing in between. The grith flow inserts a security proxy with path_match, secret_scan and taint_track filters at the syscall boundary, and only allowed actions reach the system. — Same model. Different trust boundary. The proxy is the only thing that changes who is in control.

The proxy is not a prompt filter. It is an independent enforcement layer that evaluates every syscall the agent attempts, regardless of what the model thinks about its own behaviour.

Not a classifier. A system.

Most safety tools rely on a single decision: is this safe or not? That breaks easily. A sufficiently well-crafted injection rephrases the action until the classifier allows it.

grith uses a different approach. Every action is evaluated by multiple independent filters and scored. File access patterns. Command structure. Secret detection. Behavioural anomalies. Context awareness. The results combine into one deterministic decision.

$ grith exec -- claude

→ Intercepting system calls...

fs.read("~/.ssh/id_rsa")

├─ path_match:   SENSITIVE       +4.0
├─ secret_scan:  SSH_KEY         +4.5
├─ taint_track:  EXFIL_RISK      +3.0
└─ composite:    11.5 → AUTO-DENY ✕

✓ Threat blocked. Agent continued safely.

Flow diagram of grith's scoring pipeline. One intercepted syscall fans out across five filters - path_match, secret_scan, taint_track, behavioural, and context - each producing a numeric score. The scores combine into a composite, which is mapped onto a threshold bar with three outcomes: auto-allow below four, queue for review from four to seven point nine, and auto-deny at eight or above. — Every syscall is scored by independent filters running in parallel. The composite routes the action to one of three outcomes.

The filters do not parse natural language. They evaluate raw syscalls. What the model thought about the operation, and whether the model was manipulated by a poisoned README, does not matter. The syscall is rejected before the kernel returns a file descriptor.

Security that works asynchronously

Claude made execution asynchronous. grith makes security asynchronous too.

Instead of interrupting the agent for every uncertain action:

Safe actions are auto-allowed.
Dangerous actions are blocked.
Uncertain actions are queued for review.

The queued items are delivered as a digest - a clean summary of things worth your attention. The agent keeps running. You review at your own pace.

Mock user interface of the grith digest. A sidebar shows today's totals - 2,481 actions auto-allowed, 17 auto-denied, and 4 queued. The main panel lists those four actions, each with its syscall, filter scores, a composite score, a plain-English reason, and approve and deny buttons. — The digest surfaces only the ambiguous middle. The agent keeps running while humans review at their own pace.

This model works because it matches reality. Most actions are safe. Some are clearly dangerous. A small percentage are ambiguous. Approval systems fail because they treat every action equally. grith focuses human attention only on the ambiguous middle.

The ambient authority problem

Every AI agent today starts with full filesystem access, shell access, and network access - then tries to restrict behaviour after the fact. That is backwards.

grith starts with zero ambient authority. Every capability must be explicitly granted, per action, through a controlled interface³. The agent can only do what the policy allows, and the policy is enforced at the syscall boundary by code the model cannot influence.

Two-column comparison. On the left, full ambient authority: an agent starts with filesystem, shell, network, credentials, package install, and git push all granted, with restrictions applied afterwards as advisory prompts. On the right, zero ambient authority: every capability is denied by default and only explicit scoped grants - such as fs.read on ./src, exec npm test, and net to registry.npmjs.org - are permitted, enforced at the syscall boundary. — Default-open trades safety for speed. Default-closed keeps the blast radius to whatever policy allows.

Where this is going

A new stack is emerging.

Layer 1 - Models. Claude, GPT, Gemini. Reasoning.
Layer 2 - Agents. CLI tools, IDE agents. Execution.
Layer 3 - Control. Security, audit, governance. Missing from every vendor. This is where grith lives.

For the past two years, AI has competed on intelligence. That phase is ending. The next phase will compete on control, reliability, and trust - because capability without control is not a product. It is a liability.

The takeaway

Claude 4.7 shows what agents can do. As soon as you stop watching them, the question changes. It is no longer "can they do the job?" It is "can you trust them while they do it?"

grith exists to make sure the answer is yes.

grith exec -- claude

The agent gets its autonomy. The syscalls still get scored.

We covered the trust-architecture problem with Auto Mode specifically in AI agents are now deciding what's safe to run (Claude Auto Mode). ↩
Researchers demonstrated full secret exfiltration against Claude Code, Gemini Code Assist, and GitHub Copilot using exactly this pattern. See They Hacked Claude, Gemini, and Copilot (And No One Told You). ↩
More on the default-closed model: Zero Ambient Authority for AI Agents. ↩

Like this post? Share it.

Share on X Submit to HN