AIArchitectureDev Tooling

The Vibe Coding Hangover

At some point in the last eighteen months, a new pattern became normal. A developer — or a founder, or a designer, or a product manager — opens a chat interface, describes what they want to build, and watches the code appear. They run it. It works. They iterate a few more times. They ship it.

This is vibe coding. The term was coined half-jokingly and immediately stuck because it captured something real: a mode of development where you’re steering by feel rather than by explicit understanding. You describe the destination. The model navigates. You approve or redirect. Repeat until done.

For certain categories of work, this is genuinely transformative. A prototype that would have taken a week takes an afternoon. An idea that would have stayed in a Figma file because nobody had time to build it gets shipped. The barrier between “I can imagine this” and “this exists” has collapsed in a way that is real and good and not going away.

The problem is what happens next.

The prototype that became production

Here is a pattern I have now seen enough times to consider it a trend rather than an anecdote.

Someone builds something with AI assistance. It works well enough to show people. People are impressed. It gets used by real users. The usage grows. Now it’s handling real data, real transactions, real workflows that people depend on.

And the person who built it — or the team that inherited it — does not fully understand how it works.

Not in the usual way that engineers don’t fully understand legacy systems, where the code is comprehensible but the history is opaque. In a more fundamental way: the code was generated by a model, reviewed at a surface level for whether it seemed to work, and shipped. The mental model that usually gets built during implementation — the one where you understand why each piece is the way it is, what the edge cases are, what will break under load — was never built, because the implementation was not a thinking process. It was an approval process.

This is the vibe coding hangover. The morning after the productive, exciting, look-how-fast-we-shipped evening, when you have to debug something that’s broken in production and you realise you don’t have a clear enough mental model of the system to know where to look.

What understanding actually is

When a developer writes code themselves — not generated code, code they produced through deliberate thought — they build a mental model as a byproduct. Not as a separate exercise. As a consequence of the process of thinking through the problem, considering the edge cases, making decisions about structure and naming and error handling.

This mental model is the thing that lets you debug the system six months later. It’s what lets you explain to a new team member how the system works. It’s what lets you reason about the consequences of a change before you make it. It’s what lets you look at a production incident and form a hypothesis about the cause in minutes rather than hours.

Vibe coding produces code without producing this mental model. The model does the thinking. The developer does the steering. Steering is real work — framing the problem clearly, evaluating the output critically, redirecting when the approach is wrong — but it is a different cognitive activity from implementation, and it produces a different artifact in the developer’s head.

The code artifact is often good. The mental model artifact is often thin. And in production, you need both.

The specific failure modes

I want to be concrete about what goes wrong, because “you don’t fully understand your system” is abstract and easy to dismiss.

Debugging becomes archaeology. When something breaks, the normal process is: form a hypothesis based on your mental model, look at the evidence, confirm or refute, repeat. With vibe-coded systems, step one is blocked. You don’t have a strong prior about where the problem is. So you read the code like a stranger reads it, which is slower and produces more wrong hypotheses. Incidents that should take twenty minutes to diagnose take two hours.

Changes have unexpected consequences. Every non-trivial system has coupling — places where one component’s behavior depends on another’s in ways that aren’t obvious from local inspection. When you build a system yourself, you accumulate knowledge of this coupling as you build it. When you approve a system someone else designed, the coupling is hidden. You change one thing and something seemingly unrelated breaks, and you’re surprised because you didn’t know they were connected.

Security holes ship invisibly. Generated code often includes patterns that are functionally correct for the happy path and broken for adversarial inputs. SQL injection vectors in string-interpolated queries. API endpoints that don’t validate input because the model assumed validation would happen elsewhere. Authentication checks that are present but bypassable because the model didn’t understand the full security model. These don’t show up in casual review because they look fine. They require understanding the threat model to catch, and you can’t evaluate the threat model of a system you don’t understand.

Performance problems surface at scale. Generated code optimises for correctness, not efficiency. The N+1 query that works fine for ten records becomes a production incident at ten thousand. The in-memory sort that’s imperceptible with a small dataset brings down the server when real users arrive. Catching these before they hit users requires understanding the code well enough to reason about its behavior at scale, which is exactly the understanding that wasn’t built during development.

Onboarding new developers fails. “Read the code and ask the model to explain anything you don’t understand” is not a substitute for a team that built the system together and can answer questions from experience. The institutional knowledge that usually lives in the heads of the people who built something is absent when the thing was built by a model. Documentation that was generated alongside the code is documentation of what the code does, not why it does it that way, what was considered and rejected, or what the tricky parts are.

This is not an argument against AI-assisted development

I want to be direct about this because it’s the obvious misreading.

Using AI to help write code is good. It is faster, it catches things you’d miss, it’s particularly good for boilerplate and for code in frameworks you know less well, and the productivity multiplier for developers who use it well is real.

The problem is not AI assistance. The problem is the mode of AI assistance where you stop being an engineer making decisions and become an editor approving output. Where you stop building mental models and start trusting that the model’s output is probably fine. Where the bar for shipping is “it seems to work” rather than “I understand it well enough to be confident it will keep working.”

The distinction is not about which tool you use. It’s about the cognitive work you do alongside the tool. Two developers can use the same AI assistant and end up in completely different places: one who used it to implement ideas they understood, and one who used it to generate ideas they approved. The code they ship may look similar. The mental models they carry are very different. And when something breaks at 2am, the difference becomes concrete.

What responsible AI-assisted development looks like

The principle I keep coming back to: the model generates, you understand. Not reviews. Understands.

This means that before any AI-generated code goes into a production system, you can answer the following questions without looking at the code:

If you cannot answer these questions, you don’t understand the code well enough to ship it. That’s true whether the code was written by a model, a contractor, a junior engineer, or your past self six months ago.

For code review specifically, reviewing AI-generated code requires the same standard as reviewing human-written code — which is a higher standard than most vibe-coded code gets, because vibe-coded code often isn’t reviewed by anyone except the person who generated it, who has a natural tendency to trust output they prompted for.

# Generated code that passes a surface review:
def get_user_orders(user_id: str, db: Session):
    return db.query(Order).filter(
        Order.user_id == user_id
    ).all()

# Questions you should be asking:
# - Is user_id validated before this is called?
# - Can any authenticated user query any user_id?
# - What happens if user_id is None?
# - How many orders could this return? Is there a limit?
# - Is this called in a loop anywhere? (N+1 risk)
# - What's the index situation on Order.user_id?

None of these questions require the code to be human-written. They require the reviewer to think like an engineer rather than like a proofreader.

Understand before you extend. The most dangerous moment in a vibe-coded system’s life is when it needs to grow. The first version was greenfield — the model had a blank page and produced something internally consistent. Extension requires understanding what’s already there. Prompting a model to add a feature to a system it generated, when you don’t understand the system yourself, produces code that is locally plausible and globally inconsistent. The coupling gets worse. The edge cases multiply. The system becomes progressively harder to reason about.

Before extending a vibe-coded system, read it. Actually read it. Understand the data model. Understand the control flow. Understand the error handling. Build the mental model you should have built during development. This takes time. It is not optional if the system is going to be maintained.

Write tests that encode your understanding. Tests are documentation of what you believe the system does. Writing tests for AI-generated code is not just about coverage — it’s about forcing yourself to articulate your understanding of the system’s behavior precisely enough to express it in assertions. If you can’t write the test, you don’t understand the behavior.

def test_get_user_orders_only_returns_own_orders():
    """
    Critical: users must not be able to see other users' orders.
    This is a security boundary, not just a functional requirement.
    """
    user_a = create_user()
    user_b = create_user()
    order_a = create_order(user_id=user_a.id)
    order_b = create_order(user_id=user_b.id)

    result = get_user_orders(user_id=user_a.id, db=db)

    assert len(result) == 1
    assert result[0].id == order_a.id
    assert order_b.id not in [o.id for o in result]


def test_get_user_orders_with_no_orders():
    user = create_user()
    result = get_user_orders(user_id=user.id, db=db)
    assert result == []


def test_get_user_orders_with_none_user_id():
    with pytest.raises(ValueError, match="user_id cannot be None"):
        get_user_orders(user_id=None, db=db)

Writing this test forced you to think about the security boundary, the empty case, and the null input. That is the understanding you needed before shipping the code, and the test is the artifact that proves you built it.

The systems that will matter

In two years, there will be two categories of AI-assisted software in production.

The first category: systems that were built fast, shipped fast, and grew faster than the understanding of the people responsible for them. These systems will have accumulating incident debt, slow debugging cycles, and a growing reluctance to change anything because nobody is confident what a change will break. Some of them will have quiet security failures that nobody notices until someone looks.

The second category: systems where AI was used as a powerful implementation tool by developers who maintained engineering discipline — who understood what they were shipping, tested their understanding, and built the mental models that make systems maintainable. These systems will be faster to build than anything before AI, and more maintainable than most things built without it.

The difference between them is not which AI tools were used or how capable the models were. It is whether the humans in the loop were doing engineering or doing approval.

Vibe coding is a great way to get something working. Engineering is how you keep it working.

The hangover is optional.