AI Writes Your Code. Nobody Verifies the Intent.

AI sped up code generation. It did not solve trust.

Apr 23, 2026

I live in two different worlds now.

In one, AI made me more productive than I have ever been.

I have written more software in the last two years than across the rest of my career. I have barely written any code manually in the last year.

That part is real.

The speed boost is real.

The weird part is what came with it.

AI helps me ship more.

But it also asks me to trust more.

That is the uncomfortable part.

I am not just delegating typing.

I am delegating thinking, validation, and judgment too.

And I am still not sure where the safe line is.

In the other world, I lead engineering for software used by banks, governments, and other regulated environments, where mistakes are expensive and confidence matters more than speed.

And if you ask whether AI made us ship features 2x faster there, the honest answer is no.

Not even close.

That does not mean AI was useless.

It helped somewhere else.

It reduced noise.

A lot of engineering time in a big system does not go into writing the feature. It goes into interruption-based work: support engineers trying to understand how a feature behaves, PMs trying to figure out whether something is a bug or intended behavior, solution architects pulling in senior engineers just to inspect a corner of the system.

Tools that let people talk to the codebase, inspect it safely, and even generate tests or benchmarks to validate a hypothesis helped a lot with that.

People were less interrupted.

Context switching got better.

Engineers were happier.

But the main bottleneck did not move.

Implementation got dramatically faster. Trust did not.

That is the wall I keep hitting in both worlds.

The part people keep smoothing over

The industry keeps talking as if faster code generation automatically means faster engineering.

It does not.

In a lot of teams, it just means mistakes can scale faster than judgment.

As an individual engineer, I can create software much faster than before. Good software too. Clean structure. Tests. Refactors. Nice terminal output.

And still I trust it less than I want to.

Maybe less than before, because I know how much invisible reasoning I no longer fully own.

As a Head of Engineering, I can see the same problem from the other side.

We can accelerate some parts of the flow.

But we still have to verify whether the thing we built is actually the right thing, and whether it behaves correctly in the bigger system.

In a complex product, implementation is a relatively small slice of the work.

Validation and verification are the bigger slice.

That is why I keep coming back to the same phrase:

verification gap

The verification gap is the distance between what I mean and what I can actually prove.

Between intended behavior and demonstrated behavior.

That gap always existed.

AI did not invent it.

It just made it wider, faster, and easier to ignore until production forces the issue.

Why this got worse with AI

When humans wrote the code, the same brain often held the intent, the implementation, and the validation loop together.

Not perfectly.

People still shipped bugs. Specs were incomplete. Tests missed things.

But there was at least one place where the system could be understood as a whole: the person writing it.

That is no longer the default.

Now the human writes the prompt.

The model writes the code.

The model writes the tests.

The human skims the diff.

The model writes the cleanup.

The CI passes.

The feature ships.

And if the original intent was slightly wrong, incomplete, or misunderstood, that mistake does not stay in one place anymore.

It gets propagated through the whole stack.

The plan is based on the wrong assumption.
The implementation is based on the wrong assumption.
The tests are based on the wrong assumption.
The “manual validation” is often you asking the same model to sanity-check itself.

And then you look at the whole thing and it feels solid.

But it is solid on top of the wrong assumption.

So what exactly are we proving at that point?

That the system is internally consistent with the assumption it invented for itself.

Not that it matches your intent.

That is why so much AI productivity discourse feels fake to me.

A lot of teams did not automate engineering.

They automated typing.

That difference matters more than most people want to admit.

Bug free is not the same as intent-correct

People keep saying: just write better tests.

I do write tests.

AI writes tests for me too.

That is not the point.

Tests verify behavior for cases somebody thought of.

That somebody used to be a human.

Now it is often a human plus a model.

That is still not the same thing as verifying intent.

You can have 100% line coverage and still completely miss the thing that matters.

You can have a green CI run and still not know whether the software behaves the way you intended.

You can even have bug-free code in a narrow sense and still have software that is wrong.

A green pipeline can still be a polished misunderstanding.

That is one of the biggest traps in the current AI coding wave.

We are getting very good at generating artifacts.

Code.

Tests.

Docs.

Migration scripts.

Benchmarks.

RFC drafts.

None of that answers the deeper question:

does the system actually do what we mean?

Software is not flat. It is layers.

The problem gets worse as the software gets bigger.

Software is not flat.

It is layers.

It is wide, deep, and full of interacting components, hidden assumptions, backwards compatibility constraints, old decisions nobody remembers, and behavior that only makes sense if you know four other subsystems.

Any project that lives long enough eventually reaches a point where one brain is no longer enough.

That was true before AI.

It is still true now.

AI does not remove that limit.

In some cases it makes you hit it faster, because you can generate change faster than you can understand its consequences.

That is why the industry created all the layers around engineering in the first place:

CI/CD
QA
RFCs
Architecture reviews
Team ownership boundaries
Support escalation paths
Approval workflows

These are not random rituals.

They are patches over the same underlying problem:

software complexity grows beyond what one brain can safely manage.

Where does intent live now?

I think mainstream software engineering is still missing something fundamental.

We do not maintain a real source of truth for intent.

If I ask where the intended behavior of a system lives right now, the honest answer in most teams is:

all of it combined badly.

Some of it is in source code.

Some of it is in tests.

Some of it is in RFCs.

Some of it is in Jira tickets.

Some of it is in Confluence.

Some of it is in the heads of senior engineers.

None of those is the place where I can go and see, clearly, how the system is supposed to behave right now.

That is not a source of truth.

That is archaeology.

And that feels like a drastic difference from fields like aerospace or automotive.

They have their own fragmentation problems too. Different groups write requirements, validate them, implement them, monitor them. Those worlds often barely talk to each other.

But at least intended behavior is treated as a first-class artifact.

There is an SRS.

There are explicit requirements.

There is a recognized place where intent is supposed to live.

In mainstream software, especially for something complex like an API gateway, that still feels almost unimaginable.

We mostly reconstruct intent after the fact from scattered artifacts.

And then we act surprised when regressions keep happening.

Why enterprise teams do not get the full AI payoff

This is also why the conversation about AI productivity is often too shallow.

Yes, implementation is faster.

Sometimes dramatically faster.

But if speed of implementation is no longer the hard part, then what is?

That is the real question.

If a feature can be implemented in hours instead of weeks, why have so many teams not seen the full payoff?

Because implementation was never the only bottleneck.

The harder part is deciding what should be built, making that intent explicit enough, and then verifying that the resulting system still matches it after the code, tests, and surrounding context have all changed.

That is where the time goes.

That is also where a lot of current AI hype becomes unserious.

People showcase how fast a model can produce code.

Fine.

Show me how fast your team can decide what is correct, verify that the behavior matches the intent, and avoid turning six months of hyperproductivity into twelve months of regression cleanup.

At work, we effectively built a zero-trust environment.

We do not blindly trust humans.

We do not blindly trust AI.

We review the code.

We validate the assumptions.

We check the tests.

That posture protected quality when AI adoption accelerated.

But it also meant we did not suddenly become 10x faster.

We became less noisy.

More focused.

Better at answering questions.

Faster in implementation.

Still constrained by verification.

Not everyone needs safety. Everyone needs trust.

As an individual engineer, the same tension shows up in a different shape.

I can move incredibly fast.

But I know that if I let trust slide too far, I eventually stop building and start doing bug fixing and regression management full-time.

The software turns into glue and patches.

You can feel your taste slipping if you are not careful.

It all kind of works, but you are no longer fully sure why.

Safety bar differs. Obviously.

A bank flow is not the same thing as a weekend prototype.

One component inside a product may deserve a much stricter baseline than another.

But trust? Everyone needs that.

If I built a website, a product, a service, an internal tool, whatever it is, I need to trust that it actually follows my intent closely enough for the context it lives in.

That is the standard I care about.

Not some abstract perfection.

Not a fantasy of zero bugs.

Not a productivity screenshot.

Trust.

Can I tell how my software behaves right now?

Do my docs, specs, tests, and code align with each other?

Do I know which parts are intentional, which parts are accidental, and which parts are cargo cult left over from earlier decisions?

When I change something, am I making the system better, or just shifting uncertainty around?

So what is engineering now, exactly?

Where is the place of the human?

Where is the place of judgment?

And which part should I never offload, even if AI is very good at pretending it can carry it for me?

Those were already hard questions before AI.

AI did not create them.

It amplified them.

It exposed how incomplete our current software practices already were.

Why I am writing this

That is why I do not think a smarter model or a shinier coding assistant will solve this by itself.

The missing layer is verification.

Not just whether the code runs.

Not just whether the tests pass.

Not just whether the reviewer approved.

I mean verification of intent.

That is what I have been thinking about for a long time now, and why I am starting this newsletter.

I want to write about the gap itself, what causes it, why it compounds, why mainstream software and regulated engineering barely learn from each other, and what it would take to close it.

Not with slogans.

With examples, systems, failures, tools, and uncomfortable questions.

AI did not remove the hard part of engineering.

It moved it from writing to verification.

If this problem feels familiar, subscribe.

This is what I am writing about now.

The Verification Gap

Discussion about this post

Ready for more?