ArchitectureDev ToolingAI

The Database Is Not Your Enemy

There is a opinion that became consensus so gradually that most developers I talk to don’t remember adopting it. It arrived with ORMs, accelerated with microservices, and was cemented by the NoSQL movement. It goes something like this: the database is for storage. Logic belongs in the application. The less your database knows about your business rules, the better.

I spent years believing this. I’ve watched teams build entire architectures around it. And I’ve spent the last few years quietly watching those architectures develop the same categories of problems, in the same order, for the same reasons.

The opinion is wrong. Not completely wrong — there are real costs to putting logic in the database, and they deserve honest acknowledgment. But as a default architectural principle, “keep the database dumb” has caused more production incidents, more data corruption, and more performance problems than almost any other idea I can think of that became standard practice.

This is my attempt to say that clearly.

How we got here

The case against database logic is not irrational. It has real arguments behind it, and it’s worth understanding them before explaining why I think they’ve been overweighted.

ORMs made database interaction feel like object manipulation. You stopped writing SQL and started calling methods on models. The database became an implementation detail of the persistence layer rather than a participant in the system’s logic. This felt cleaner. It also meant that every database interaction was mediated by a layer that had opinions about how data should be structured and accessed, and those opinions were not always aligned with what the database was good at.

The microservices movement added an architectural principle: services should own their data. If Service A needs data from Service B, it asks Service B via API, not by querying Service B’s database. This is a sound principle that was frequently extended into an unsound one: if services shouldn’t share databases, then cross-cutting concerns like referential integrity and consistency can’t live in the database either. They have to live in the application.

The NoSQL movement, at least in its most evangelical form, argued that relational databases with their schemas and constraints and transactions were obstacles to development speed. Move fast, denormalize, handle consistency in the application layer. The implicit message: the database’s job is to store documents, not to enforce rules.

Each of these movements had legitimate concerns it was addressing. The consensus they produced — the database is dumb storage, logic lives in applications — was an overcorrection that the industry is still paying for.

What happens when the database is dumb

Let me describe a system. It’s composite, but every element of it is something I’ve encountered in real production systems.

There’s a users table and an orders table. Orders have a user_id foreign key. There are no foreign key constraints in the database — “we handle referential integrity in the application layer.” There are no check constraints — “we validate in the application.” There are no default values for audit columns like created_at — “the ORM handles that.”

The application has been running for two years. There have been three major refactors. There are six services that write to the orders table through different code paths. There’s a data migration script someone wrote eighteen months ago that bypassed the application layer and wrote directly to the database because “it was just a one-time thing.”

Now the database has orders with null user_ids. It has orders with user_ids that reference deleted users. It has orders with created_at timestamps in 1970 because someone forgot to set the default in a new code path. It has orders in states that the application’s state machine considers impossible, because two services wrote to the same order concurrently and the application- layer lock wasn’t actually preventing concurrent writes the way the developer thought it was.

None of this is dramatic. None of it caused a single catastrophic incident. It accumulated slowly, over months, through individually reasonable decisions made by developers who were each working within a part of the system without visibility into the whole. And now the data is untrustworthy in ways that are hard to enumerate and harder to fix, because fixing it requires understanding every code path that could have produced the bad data, which requires understanding a system that has been built by multiple teams over two years.

This is what “keep the database dumb” produces at scale and over time.

What the database is actually good at

Relational databases — and I’m talking primarily about PostgreSQL, which is what I reach for — are not dumb storage. They are sophisticated systems built by very smart people over decades, optimised for exactly the problems that application-layer logic handles poorly.

Constraints are guarantees, not suggestions. A not-null constraint in the database means that column is never null. Not “is never null when the application works correctly.” Never null, regardless of which code path wrote the data, regardless of whether someone ran a migration script directly, regardless of whether a bug in a new service bypassed the validation layer. The constraint is enforced at write time by the database, and no amount of application-layer confusion can violate it.

This sounds obvious. It is obvious. And yet the majority of production databases I’ve seen have far fewer constraints than they should, because developers were taught that constraints belong in the application.

-- What most production schemas look like:
CREATE TABLE orders (
    id UUID,
    user_id UUID,
    status TEXT,
    total NUMERIC,
    created_at TIMESTAMP
);

-- What they should look like:
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    status TEXT NOT NULL CHECK (
        status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')
    ),
    total NUMERIC(10, 2) NOT NULL CHECK (total >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

The second schema makes it structurally impossible to insert an order without a valid user, with an invalid status, with a negative total, or without a timestamp. Not hard. Impossible. The database enforces this for every write from every code path from every service from every migration script from every developer who decides to “just quickly update this record.”

Transactions are the unit of truth. When you need to do multiple things atomically — debit one account and credit another, create an order and reserve the inventory, update the user and write the audit log — the database transaction is the correct tool. It either all happens or none of it happens. There is no state where the debit occurred and the credit didn’t.

Application-layer coordination of distributed writes is fundamentally harder and more failure-prone than database transactions. It requires compensating transactions for rollback. It requires careful handling of partial failures. It requires distributed locking that is complex to get right and expensive to operate. Most of the time, the distributed write was an attempt to coordinate things that should have stayed in the same database in the first place.

The query planner knows things you don’t. When you write a complex query in SQL and let the database execute it, the query planner analyzes the data distribution, the available indexes, the join order, and the available execution strategies, and picks the fastest one. When you write the same logic in application code — fetch these records, filter them, join them to these other records in memory, sort them, return the top ten — you get one execution strategy: the one you wrote.

The database’s query planner will almost always beat your in-application data processing for non-trivial queries over non-trivial data volumes. Not because you’re a bad programmer. Because the query planner has information you don’t have and has been optimised for exactly this problem for decades.

Generated columns and materialized views are underused. If you have a computed value that derives from other columns — a full name from first and last name, a total from line items, a status derived from timestamps — you can compute it in the application every time you read the record, or you can have the database maintain it automatically.

-- Generated column: maintained automatically by the database
ALTER TABLE users ADD COLUMN full_name TEXT
    GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED;

-- Materialized view: precomputed query result, refreshable on demand
CREATE MATERIALIZED VIEW order_summaries AS
SELECT
    u.id AS user_id,
    u.email,
    COUNT(o.id) AS order_count,
    SUM(o.total) AS lifetime_value,
    MAX(o.created_at) AS last_order_at
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email;

CREATE UNIQUE INDEX ON order_summaries(user_id);

Querying order_summaries is now a single indexed lookup instead of an aggregation over the entire orders table. The database maintains it. You don’t write any application code to keep it consistent.

The AI angle that changes this conversation

Here’s something that’s becoming practically relevant right now and isn’t being discussed enough: AI-assisted development is changing the risk profile of application-layer logic in ways that make database constraints more important, not less.

When a developer wrote every line of code, there was at least a chance they understood the invariants their code was supposed to maintain. When a model generates the code and the developer reviews it at a surface level — which is, as I wrote last time, the failure mode of vibe coding — those invariants may not survive the transition from intent to implementation.

The model writes a new service that creates orders. The developer reviews it, it looks correct, they ship it. What the developer didn’t notice: the new service doesn’t call the same validation helper the other services call. It creates orders in a state that the application’s state machine considers impossible.

If the database has a check constraint on the status column, this write fails at the database level. The error is immediate and unambiguous. The developer finds out during testing, not in production.

If the validation lives only in the application, the write succeeds. The data is corrupt. Nobody finds out until someone queries the database and gets confused by the impossible state, which might be days or weeks later, at which point the corrupting code path may be difficult to identify.

Database constraints are compiler errors for data integrity. Application- layer validation is a runtime check that’s only as good as the test coverage for every code path that touches the database. Given that AI-generated code often has incomplete test coverage and may not correctly implement all the invariants of the existing system, the database constraint is a more reliable safety net than it’s ever been.

The legitimate costs

I said at the start that the costs of putting logic in the database deserve honest acknowledgment. They do.

Schema migrations require care. Adding a constraint to an existing column on a table with millions of rows is not instantaneous. It requires a table scan to validate existing data. On large tables, it requires careful use of NOT VALID followed by a separate validation step, and possibly a maintenance window. Application code can be deployed with a feature flag. Schema changes require planning.

-- Adding a constraint to a large table safely
-- Step 1: Add not-valid (doesn't scan existing rows, fast)
ALTER TABLE orders
    ADD CONSTRAINT orders_total_positive
    CHECK (total >= 0)
    NOT VALID;

-- Step 2: Validate separately (can be done without locking)
ALTER TABLE orders
    VALIDATE CONSTRAINT orders_total_positive;

Logic in the database is harder to test in isolation. You can unit test application code without a database. Testing check constraints and triggers requires a database. This is a real cost, and it means that database logic needs a different testing strategy — typically integration tests against a real database, which are slower than unit tests.

Portability is real for some teams. If you genuinely need to support multiple database backends, putting logic in database-specific features (PostgreSQL triggers, generated columns, specific constraint syntax) creates coupling. This is a real constraint for library authors and for some enterprise products. For the vast majority of applications that will run on one database for their entire lifetime, it’s a theoretical concern that shouldn’t drive architectural decisions.

These costs are real. They’re also manageable. And they’re substantially smaller than the cost of corrupt data, debugging production incidents without trustworthy data, and rebuilding mental models of systems where the invariants are distributed across six services and an application layer.

What I actually do

Concretely, here’s how I think about the division between database and application now:

Database owns: nullability, foreign key relationships, value constraints (valid statuses, positive amounts, valid ranges), uniqueness, default values for audit columns, referential integrity, and any invariant that must hold regardless of which code path touches the data.

Application owns: complex business logic that requires external context the database doesn’t have, multi-step workflows that span multiple systems, formatting and presentation, and validation that requires calling external services (checking a credit card, validating an address).

The heuristic I use: if the invariant can be violated by any code that touches the database — including migration scripts, direct SQL, future services you haven’t written yet — it belongs in the database. If it can only be violated through the application layer and you’re confident all writes go through the application layer, it can live in the application.

In practice, more invariants than you’d expect meet the first criterion.

The systems that age well

I’ve seen enough production systems at enough different stages of their lifecycle to have a strong prior about which ones age well.

The ones that age well have databases that are opinionated. Not every table has twenty constraints, but the critical ones — the ones that represent business invariants that must always hold — are enforced at the database level. When a new developer joins and writes a new code path, the database catches their mistakes. When someone runs a migration script to fix something quickly, the database catches the cases that would produce bad data.

The ones that age poorly have databases that are compliant. They store whatever you send them. The invariants live in application code that has been through multiple refactors, maintained by teams that have turned over, in services that have multiplied. The data has accumulated inconsistencies that are expensive to audit and expensive to fix. Every query that needs to be trusted has to be accompanied by defensive checks because you can’t fully trust the data it’s operating on.

The database is not your enemy. It’s not a dumb storage layer you should insulate yourself from. It’s the most reliable component in your system — running on battle-tested code, built by people who have thought about data integrity for longer than most of us have been writing software.

Let it do its job. It’s very good at it.