AI Makes Adding Features Faster - So Why Not Add Just One More?
A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.
I've been shipping software professionally for somewhere north of thirty years. Field Logic (fieldlogic.uk) is the current company, grith is the current product, and at this point I've lost count of how many side projects I've started before deciding the scope was wrong and either cutting it down or quietly shelving the thing.
There's a pattern to side projects that anyone who's done a few of them will recognise. You write down the MVP. The MVP is small, defensible, and you can describe it in two sentences without using the word "platform". You build it on evenings and weekends. Six weeks in, you've built something three times that size, the launch keeps slipping, and the original two-sentence pitch has acquired four sub-clauses and a footnote.
The thing that used to fight back against that, in my experience, was time. Every new feature came attached to a roughly honest gut estimate: "a weekend, probably two, fine, three if I'm being realistic". You'd weigh the feature against that estimate and most of the time you'd say "later". The friction was the budget. Time was the budget. The budget was always tight, and that tightness is what kept the MVP an MVP.
That negotiation is largely over.
The unit economics of a feature have changed
I'm not going to pretend this is a controversial observation. Anyone using current AI coding tools seriously has felt it. The thing that used to take a weekend takes an afternoon. The thing that used to take an afternoon takes an hour. Some things, particularly the ones that are 80% scaffolding, take long enough to write the prompt and not much longer to read the result.
The honest version of how I work now, for a fresh feature in a system I already understand:
- I write a short plan. Maybe a page. Sometimes a conversation with Claude where I describe the constraint, list the options I can see, and ask it to pick them apart. The output is a plan I'd actually be willing to defend to a colleague.
- I hand the plan to Codex or to a Claude Code session running under grith's own supervisor and let it grind through the implementation while I do something else.
- I review the diff, push back on the bits that are wrong, and ship.
It is not that this produces flawless code. It produces ordinary working code, with the same bugs and rough edges and second-pass refactors any human would generate, but it produces it at a rate that makes a mockery of the scope estimates I was using two years ago. The cost of "let me just try this and see" has collapsed.
Which sounds, on the face of it, like nothing but good news.
The problem nobody warns you about
Here's the thing nobody put on the marketing page. When the cost of building a feature collapses, the natural friction that kept your scope honest collapses with it. The implicit "is this worth a weekend" filter that most experienced developers run, mostly without noticing, stops doing its job. Because it's not a weekend any more. It's a couple of hours. And you've got a couple of hours.
So the question shifts. It used to be "can I afford to build this?" and the answer was usually no, and you'd move on. Now the question is "should I build this?" and that's a much, much harder question to answer in the moment. Because the answer almost always involves the future: people you don't have yet, scale you don't have yet, edge cases you've imagined but not actually hit. And the hypothetical version of your product, the one with all the features the current users haven't asked for, is always more compelling than the boring real version sitting on your laptop.
Time used to do the deciding for you. With time gone, you have to do it yourself, on every feature, and your judgement on the matter is biased in exactly the wrong direction, because each individual feature genuinely is a good idea on its own merits.
This is the trap I've been working through on grith over the past few months.
What grith was supposed to be - and what it is
The original plan was two-sentence MVP territory. Intercept the syscalls a CLI agent makes - file, network, exec - run them through a multi-stage filter pipeline, return ALLOW, DENY, or QUEUE for review. Linux only. Ship it.
The product that exists today does all of that, and does it in a meaningfully stronger form than the two-sentence pitch implied. The Linux supervisor works, the filter pipeline runs comfortably under 15ms P95, the audit chain is tamper-evident, the dashboard does what it should. The launch is later than it would have been if I'd held the line on the original scope. On balance, I am not sorry about that. What shipped is materially better than what was planned. The trick is being honest about which parts of the extra work made the product stronger and which parts were just drift.
AI didn't just accelerate the features
The thing that gets less airtime in the productivity-bump narrative is that AI didn't only speed up the feature work. It sped up everything around it. Literature reviews of the syscall-filtering prior art. Threat-model walkthroughs where I'd describe a hypothetical attacker and have Claude pick holes in the design, then write the design up tighter. Integration-test scaffolding I'd never have had patience to write by hand. Fuzzing harnesses for the filter pipeline. Regex coverage matrices for the secret scanner. A red-team pass against the IPC layer that found two real issues I'd missed.
The research and test base behind the current filter pipeline is several times deeper than I'd have produced on a pure-human budget. A lot of the features in the "first pile" below exist because that research surfaced edge cases I then had to handle - quarantine integrity, exfiltration heuristics, sensitive-path coverage. That part of the time spent, I'd spend again tomorrow. The other part - the one where I'm rebuilding the dashboard for the third time because the new design is nicer - that part I would not.
The drift, in concrete
It's worth being honest about which kind of drift is which. Looking at what's in grith today that wasn't on the original two-sentence plan, there are two distinct piles, and conflating them is how scope arguments go wrong.
The first pile is things the threat model demanded once I sat with it for a while. If you're going to intercept syscalls and quarantine suspicious ones, you need somewhere to put quarantined records that an attacker can't quietly edit, so hash-chain integrity isn't really optional. If you're going to store LLM provider keys on behalf of users, plaintext is not a reasonable answer, so AES-256-GCM at-rest encryption is table stakes. Secret-scanning patterns, sensitive-path heuristics for /etc/shadow and friends, egress filters with entropy checks on URLs, canary tokens - these aren't features so much as basic competence for a tool whose entire job is noticing when an agent does something it shouldn't. None of it was on the original plan because the original plan was a two-sentence pitch, not a security design.
The second pile is the genuine scope expansion. A dashboard with live session tracking, scatter plots, donuts, and a digest review flow, where the MVP said "logs in a SQLite file you can query". An adaptive reputation system with Bayesian updates from digest feedback - genuinely cool, genuinely a v2 feature built into v1. A long-running daemon with bearer-token IPC, idle auto-shutdown, and a thin-client grith exec, because the cold-start latency on the supervisor was 400ms and I couldn't leave that alone. Notification channels with tier gating. A profile editor. Remote profile overlay distribution.
The first pile I'd defend in any room. It's why the product that lands at launch is stronger than the one I sketched a year ago - more thoroughly threat-modelled, more thoroughly tested, with security depth in places the original plan didn't even know to ask about. The second pile is where the discipline failed, and where the launch slipped. The features in it aren't bad - each one is doing real work, each one was an obvious yes in the moment - but they weren't on the launch path. The lesson isn't "build less". It's "build less of the second pile, and don't apologise for the first".
"Just one more" is the dangerous shape
The pattern I keep catching myself in goes something like this. I'm working on the supervisor. I notice that the audit log has a noisy edge case. I think "I'll just clean that up while I'm in here". An hour later I've designed a structured audit-flush mechanism with batching. Three days later I've got an audit_sync config option, a last-synced timestamp on sessions, and a small UI panel showing sync state.
None of that was on the path to "supervised Claude Code session, real filters, real digest". All of it was useful.
The shape that gets you is "just one more". Not a roadmap-level decision to scope up, which I'd notice and push back on. A series of small, in-the-moment yeses, each of which is the right call locally and the wrong call globally. AI doesn't cause this. Developers have been doing it forever. But AI accelerates it, because the cost that used to make you say no out loud has been quietly removed from the equation.
There's a related shape where you go to add the small obvious thing and the AI helpfully suggests three larger less obvious things on the way. You wanted a logging tweak; you got a logging architecture. The model isn't wrong, exactly. It's just optimising for "complete, correct, robust solution" when what you needed was "minimum thing that unblocks the next test". You have to know which one you're asking for and push back when you're handed the other one.
The discipline I'm trying to keep
I don't have this solved. If I had it solved, grith would be on its third post-launch release by now. What I'm trying to do, with mixed success:
Write the plan first, in prose, before any code. Not "we should do X". Why we should do X, what the constraints are, what the alternatives are, why this one wins. If I can't write a defensible page on it, the feature isn't ready to build, no matter how easy it would be to build. The AI is very good at executing inside a frame. It is not especially good at choosing the frame. So I spend my time on the frame.
Define the launch path explicitly. For grith, the launch path is "supervised Claude Code session, real filter pipeline, real digest, on Linux". If a feature isn't on that path, it goes in the icebox. The icebox is not a loss. The icebox is where features go to be reconsidered after real users tell me what they actually need.
Make the AI argue with the AI. I plan in one session, critique the plan in a second, implement in a third. A fresh session will catch things a single write-and-review pass won't, because by review time the model has committed to its own design. You want the reviewer coming in cold. The same trick works for "is this feature in scope?". Ask it cold. Don't feed it the context that makes the answer obviously yes.
Cap the active list. A small number of in-flight features at a time, hard. Not "in the backlog" - in active development. The icebox is unbounded; in-flight is bounded. AI lets you start ten things at once. It does not, in my experience, let you finish ten things at once.
Watch for the second-order plans. The tell that I've drifted is when I start writing plans that are about plans. A roadmap document for the analytics feature. A design doc for the design doc. When I notice that, I close the laptop.
None of this is novel advice. Most of it is the same scope discipline that experienced developers have always had to apply. What's new is that you have to apply it more aggressively, more often, and more deliberately, because the old budget-based check has stopped doing it for you.
Where it stands
grith is complete. v1 will ship Linux-only - x86_64 and aarch64 - with macOS and Windows to follow. What's left now isn't features. It's finalising the docs, the dev-ops pipeline, and one last pass of testing. Expect to be able to try it out in the next week or so.
If you're working on a side project with current AI tooling, my honest advice is the one the clock used to give me automatically. Write down the launch criteria, in plain language, on the day you start. Stick them somewhere you'll see them. When you find yourself building something that isn't on that list, stop and ask the question time used to ask for you. Is this on the path? Does it make the product materially stronger, or does it just make it nicer?
If the answer's "later", that's the same answer it always was. The new skill is being honest about which one you're hearing.
Like this post? Share it.