The Code Nobody Dares to Touch

You know the one.

It has been in the codebase for three years. The person who wrote it left eighteen months ago. The comments are sparse, the variable names are cryptic, and the logic branches in ways that suggest the author was solving a problem that nobody alive at the company fully remembers. There is a test suite, technically, but it tests the happy path and nobody is confident it covers the things that would actually break.

When someone proposes changing it, the room gets quiet. Someone mentions that the last time it was touched there was a production incident. Someone else says it is probably fine the way it is. The proposal quietly dies. The code stays exactly as it was.

This pattern is so common that engineers have developed an entire vocabulary for it. Sacred code. The haunted module. The cursed function. Load-bearing spaghetti. The names are affectionate and anxious in equal measure, which captures the relationship the team has with it perfectly: they depend on it completely and they are afraid of it.

The thing about the code nobody dares to touch is not that it is old. Old code can be well-understood, well-tested, and perfectly safe to modify. The thing is that the knowledge required to understand it has decayed while the code has remained frozen. The code is a time capsule from a context that no longer exists, running in a present that has grown up around it.

And it is working, probably. For now.

How it forms

Sacred code does not arrive sacred. It earns the designation through a sequence of events that are individually ordinary and collectively produce something genuinely difficult.

A piece of logic gets written that is more complex than average. The complexity is justified at the time: the problem is subtle, the constraints are real, the author knows what they are doing. The code works. It goes into production.

The author moves on. Not necessarily to a different company. They move to a different area of the codebase, or get promoted, or take on more strategic work that pulls them away from the day-to-day details of this particular system. The code is no longer anyone’s primary responsibility. It is part of the general landscape that everyone relies on and nobody owns.

Time passes. The first developer who needs to change something adjacent to the sacred code reads it and finds it hard to follow. They make their change as carefully as possible, avoiding the parts they do not understand, and the change works. But the effort required to make that change was higher than it should have been, and the uncertainty about whether the change was truly safe was uncomfortable. They share this experience with the team. The code acquires a reputation.

The reputation changes how the next developer approaches it. They read it with the expectation that it is dangerous. They work around it more aggressively than they would if they were approaching it fresh, making changes that are technically unnecessary in order to avoid touching the thing they have been told to be careful about. The workarounds add complexity. The code that was hard to understand becomes harder.

Over time, the area around the sacred code becomes a zone of accumulated workarounds, all of them the result of developers trying not to touch the thing they were afraid of. The sacred code has become the gravity well around which a significant amount of incidental complexity orbits.

The superstition at the centre

There is a superstition inside most engineering teams that rarely gets named directly: touching the thing that works is how you make it stop working.

This superstition is not irrational. It has a basis in real experience. Systems do behave unexpectedly when complex code is modified without full understanding. Changes made with incomplete knowledge of a system do produce incidents. The correlation between touching something and breaking it is real enough to become a learned aversion.

The problem is that the superstition extends beyond the cases where it is justified. It applies to code that is genuinely well-understood and safe to modify. It applies to code that needs to change to accommodate new requirements. It applies to code that has known bugs that are not being fixed because fixing them requires the kind of engagement with the code that the superstition prohibits.

The team’s relationship with the code has shifted from technical to magical. The code works through means that are not fully understood. Interfering with it might disrupt whatever it is that makes it work. The appropriate response is reverence and distance.

This is not how good software gets maintained. It is how good software gradually becomes a liability.

# The archaeology of sacred code.
# What the comments actually say versus what they mean.

# "Don't touch this without talking to Sarah first."
# (Sarah left 14 months ago)

# "This handles the edge case from the 2023 incident."
# (Nobody remembers what the incident was or why this fixes it)

# "TODO: refactor this when we have time"
# (This comment is 3 years old)

# "I'm not sure why this works but it does"
# (Written by the person who wrote the code)

# "DO NOT REMOVE - breaks payments if removed"
# (No test confirms this. Nobody has tried removing it.)

def process_legacy_transaction(txn_data: dict) -> dict:
    # This function processes 40% of all payments.
    # It was written during a weekend migration in 2022.
    # The original developer wrote in the PR description:
    # "works in staging, should be fine in prod"
    # It has been fine in prod for 3 years.
    # The test coverage is 34%.
    # Nobody knows what happens with unicode merchant names.
    # The unicode issue has been in the backlog since April 2023.

    result = {}

    if txn_data.get("type") == "card":
        # handle card payments
        # (12 nested conditions follow, some with inline comments
        #  that reference tickets that no longer exist)
        pass

    if txn_data.get("legacy_flag"):
        # DO NOT REMOVE
        # Added after incident on 2022-11-03
        # See: INC-2847 (Jira was migrated, ticket no longer accessible)
        result["reconciled"] = True

    return result

Every comment in that function is a document of accumulated uncertainty. Each one marks a place where someone encountered the code and left a warning for the next person rather than resolving the thing that worried them. The warnings accumulate. The resolution does not.

The cost of leaving it alone

The most visible cost of sacred code is the opportunity cost of the features that are not built and the bugs that are not fixed because they require engaging with the thing nobody touches.

A product team requests a change that touches the sacred module. The engineering team estimates it at three weeks because of the investigation and careful testing required to work near the dangerous code. The same change, in code that was well-understood, would take three days. The company has paid a tax of two weeks of engineering time to work around the consequences of the accumulated avoidance.

This tax is paid on every ticket that requires touching the area. It is never counted explicitly. It appears in the estimates that seem too high, in the sprints that run over, in the features that get deprioritised because the cost of implementing them is too high relative to their value. The sacred code is a persistent drag on velocity that is invisible because it never appears as a line item.

The less visible cost is risk. The sacred code that nobody touches is the sacred code that nobody tests properly, documents accurately, or monitors thoughtfully. When it eventually fails, and systems that are never improved eventually fail, the failure will happen in code that nobody currently alive in the company fully understands. The incident will take longer than it should because the people responding to it are meeting the code for the first time under the worst possible conditions.

The people who are most likely to be paged for this incident are the people who have been at the company longest, because they are the ones who have the most incomplete knowledge of the system. Incomplete knowledge is better than none. The people who have been there the longest have been there long enough to accumulate some of it. This is how sacred code produces a specific kind of team burnout: the same people get pulled into every incident involving the area they are least afraid of, which means they are the people who have been there long enough to be only moderately afraid.

Why documentation alone does not fix it

The instinctive response to undocumented sacred code is to document it. Write down how it works. Add comments. Create a wiki page. Update the README. Make the tacit knowledge explicit.

Documentation helps. It is not sufficient, for a reason that is worth understanding precisely.

Documentation captures what the code does. It does not capture why the code does it that way, what alternatives were considered and rejected, what edge cases the author was aware of and chose not to handle, what the system looks like from the code’s perspective, and what the failure modes are when the assumptions the code makes turn out to be wrong.

The knowledge that makes code safe to modify is not primarily the knowledge of what the code does. It is the knowledge of why it does it that way. Two engineers who both know what a function does will make very different decisions about how to modify it if one of them knows that a particular design choice was made to handle a race condition that only appears under specific load patterns, and the other one does not.

This kind of knowledge is almost never written down because it lives in the working memory of the person who wrote the code, and people writing code do not narrate their reasoning as they go. The result is that documentation written after the fact by someone other than the original author is documentation of the observable behaviour, not of the reasoning that produced it.

What actually transfers knowledge is engagement with the code under conditions where questions can be asked and answered. A developer who spends two weeks making a series of small, deliberate changes to the sacred code, with automated tests that confirm the behaviour is preserved, with the freedom to ask questions of whoever has the most context, and with the explicit goal of building understanding rather than just getting the changes done, comes out of that experience with something much more valuable than any document: a mental model of the code that is tested against reality.

# The characterisation test approach to taming sacred code.
# Before touching anything, understand what it currently does.
# Not what it should do. What it actually does.
# Then make these tests pass as you modify it.

import pytest
from decimal import Decimal


class TestLegacyTransactionCharacterisation:
    """
    These tests document what the code currently does.
    They are not tests of correctness.
    When we understand the code well enough to know something
    is wrong, we change the test deliberately.
    Until then, these tests are the safety net.
    """

    def test_card_payment_returns_dict(self):
        result = process_legacy_transaction({"type": "card", "amount": 100})
        assert isinstance(result, dict)

    def test_legacy_flag_sets_reconciled(self):
        result = process_legacy_transaction({"legacy_flag": True})
        assert result.get("reconciled") is True

    def test_no_legacy_flag_does_not_set_reconciled(self):
        result = process_legacy_transaction({"type": "card"})
        assert "reconciled" not in result

    def test_unicode_merchant_name(self):
        # We do not know if this is correct behaviour.
        # We are documenting that THIS is what happens.
        # Someone should decide if this is what should happen.
        result = process_legacy_transaction({
            "type": "card",
            "merchant": "Café René",
        })
        # Document whatever actually happens here,
        # even if it looks wrong.
        assert result is not None

    def test_empty_input(self):
        result = process_legacy_transaction({})
        assert isinstance(result, dict)

    def test_none_type(self):
        result = process_legacy_transaction({"type": None})
        assert isinstance(result, dict)

Writing characterisation tests for code you do not understand is the first step in the process of coming to understand it. The act of writing the tests requires running the code with specific inputs and observing specific outputs, which is a form of empirical investigation. Each test is a hypothesis about what the code does, confirmed or corrected by running it. After writing fifty of these tests, the developer has a much more accurate model of the code than they had when they started, even if they still do not understand why it does everything it does.

The gradual relight

The approach that works for sacred code is not a rewrite. Rewrites of sacred code are the highest-risk engineering activity in a system because they require recreating all of the implicit knowledge embedded in the code without the benefit of the tests and the production track record that made the original code trustworthy.

The approach that works is gradual illumination. Make the code legible, incrementally, while keeping it working.

The first step is characterisation tests, as above. Not complete coverage. Coverage of the behaviour that would be most painful to accidentally change.

The second step is naming. Rename the things that have unclear names. Not restructuring the logic, not changing the behaviour. Just making the variable names and function names say what they contain and what they do. A function called process_data that handles payment reconciliation should be called reconcile_payment_with_processor. The rename takes five minutes. The clarity it creates is permanent.

The third step is extraction. The function that does seven things can become seven functions that each do one thing, called in sequence by the original function. The behaviour is identical. The structure is comprehensible. Each extracted function can be tested independently. The total test surface shrinks because the logic is now visible rather than buried.

None of these steps are dramatic. None of them are high-risk if done carefully. Together they produce, over weeks, a piece of code that the team can read and reason about rather than approaching with superstitious caution.

# Before: the function that does everything
def process_legacy_transaction(txn_data: dict) -> dict:
    result = {}
    if txn_data.get("type") == "card":
        if txn_data.get("amount", 0) > 0:
            if txn_data.get("currency") in ("USD", "EUR", "GBP"):
                result["processed"] = True
                result["fee"] = round(float(txn_data["amount"]) * 0.029 + 0.30, 2)
            else:
                result["processed"] = False
                result["error"] = "unsupported_currency"
    if txn_data.get("legacy_flag"):
        result["reconciled"] = True
    return result


# After step one: same behaviour, readable structure
def process_legacy_transaction(txn_data: dict) -> dict:
    result = {}

    if _is_card_payment(txn_data):
        result.update(_process_card_payment(txn_data))

    if _requires_legacy_reconciliation(txn_data):
        result["reconciled"] = True

    return result


SUPPORTED_CURRENCIES = {"USD", "EUR", "GBP"}
STRIPE_PERCENTAGE_FEE = 0.029
STRIPE_FIXED_FEE = 0.30


def _is_card_payment(txn_data: dict) -> bool:
    return txn_data.get("type") == "card"


def _requires_legacy_reconciliation(txn_data: dict) -> bool:
    return bool(txn_data.get("legacy_flag"))


def _process_card_payment(txn_data: dict) -> dict:
    amount = txn_data.get("amount", 0)
    currency = txn_data.get("currency")

    if amount <= 0:
        return {"processed": False, "error": "invalid_amount"}

    if currency not in SUPPORTED_CURRENCIES:
        return {"processed": False, "error": "unsupported_currency"}

    fee = round(float(amount) * STRIPE_PERCENTAGE_FEE + STRIPE_FIXED_FEE, 2)
    return {"processed": True, "fee": fee}

The extracted version does exactly what the original did. It is also readable by someone encountering it for the first time. The constants have names that explain what they are. The private functions have names that explain what they do. The logic that was interleaved is now separate. The test coverage of the extracted version is better because the individual functions are simple enough to test exhaustively.

This is not clever engineering. It is the deliberate, unglamorous work of translation: taking something that was expressed in one register and re-expressing it in another, preserving the meaning while improving the legibility.

The thing nobody schedules

The transformation described above does not happen on its own. It requires someone to decide it is worth doing and to allocate the time to do it. This is the step that most teams skip.

The sacred code does not appear in sprint planning. It does not have a ticket. It is not on the roadmap. The cost of leaving it alone is not visible. The cost of addressing it is immediate and real. The incentive structure points toward deferring it indefinitely.

What changes this is not a new process. It is the decision by someone with authority over engineering time to name the sacred code as a risk, to make the cost of leaving it alone visible, and to allocate the time to address it as a specific deliberate investment rather than as something that will happen when things calm down.

Things do not calm down. The sacred code does not get addressed by accident. It gets addressed when someone decides it is important enough to be worth the time.

That decision is always available. The code will be there whenever someone is ready to make it.

The question is whether that someone waits for the incident that forces the engagement, or starts before it.

The incident is coming. The only variable is whether you meet it knowing the code or meeting it for the first time.