AIArchitectureDevOps

Anthropic Just Quietly Rearranged the Entire AI Model Market. Most Teams Missed It.

Anthropic image

Last week Anthropic shipped Claude Opus 4.8. The benchmarks are extraordinary. The price is extraordinary too.

Most teams read the announcement, nodded approvingly, and kept running the same model they were running before on every single task in their stack.

That is a mistake. A specific, measurable, ongoing mistake that shows up as an invoice at the end of the month.

The tier problem nobody wants to admit

Here is how AI model selection actually works at most companies.

The founding engineer picks a model. Usually the best available at the time. Usually frontier. The team builds around it. Every feature that gets added inherits that model choice because changing it requires work nobody has time for.

Six months later the company has twelve different use cases running against the same frontier model. Document summarisation. Intent classification. Code completion. Customer support routing. Data extraction. Sentiment analysis.

Some of those tasks genuinely need Opus 4.8.

Most of them do not need Opus 4.8 anywhere close.

They need Haiku 4.5. At one-sixth of the price. With response times that are actually faster for the tasks that don’t require deep reasoning.

The team is paying for a Michelin-starred chef to make toast.

What the actual cost difference looks like

These are not rounding errors. This is the difference between a feature being profitable and a feature bleeding money.

Take a document summarisation feature. Ten thousand monthly active users. Each user summarises three documents a day. Average document: 4,000 tokens in, 400 tokens out.

Against Opus 4.8: roughly $47,000 per month.

Against Haiku 4.5: roughly $8,500 per month.

Same feature. Same user experience for 90% of documents. Difference of $38,500 per month, $462,000 annualised, for one feature at one company at modest scale.

Most teams have six features like this.

The dirty secret about model capability gaps

Here is the thing the AI benchmarks don’t tell you.

For structured tasks with clear rules, the capability gap between Haiku 4.5 and Opus 4.8 is small to nonexistent. Intent classification. Sentiment analysis. Entity extraction. Routing decisions. Data transformation. Template filling.

These are pattern-matching tasks. They are what every LLM, including the cheapest ones, is genuinely excellent at.

The gap opens up for tasks that require reasoning across ambiguous information. Complex code generation. Multi-step problem solving. Novel synthesis across large contexts. Tasks where the model needs to hold competing constraints in mind and navigate between them.

That describes maybe 20% of what most production AI features actually do.

The other 80% is structured, bounded, and does not require the model that just destroyed every benchmark in history.

Why nobody has fixed this yet

Fixing it requires a routing layer that didn’t exist eighteen months ago when these teams made their architecture decisions.

It also requires admitting you’ve been overpaying. Which, psychologically, is harder than it sounds.

And it requires running evaluations. Comparing outputs between model tiers on your specific use cases. Most teams have never done this. They assumed frontier was necessary and never tested the assumption.

The assumption is wrong for most of what they’re running.

The routing architecture that changes everything

The teams doing this right have a layer between their application and the model that makes the selection decision.

It is not complicated. It classifies the task type, matches it against a cost-quality matrix, and routes accordingly. The application doesn’t know which model ran. The user doesn’t know. The bill does know, and it is dramatically lower.

from enum import Enum
from dataclasses import dataclass
from typing import Optional


class TaskComplexity(Enum):
    SIMPLE = "simple"       # Classification, extraction, routing
    MODERATE = "moderate"   # Summarisation, drafting, analysis
    COMPLEX = "complex"     # Reasoning, synthesis, novel generation


@dataclass
class ModelConfig:
    model_id: str
    cost_per_input_token: float
    cost_per_output_token: float
    max_context: int


MODEL_TIER = {
    TaskComplexity.SIMPLE: ModelConfig(
        model_id="claude-haiku-4-5-20251001",
        cost_per_input_token=0.0000008,
        cost_per_output_token=0.000004,
        max_context=200_000,
    ),
    TaskComplexity.MODERATE: ModelConfig(
        model_id="claude-sonnet-4-6",
        cost_per_input_token=0.000003,
        cost_per_output_token=0.000015,
        max_context=200_000,
    ),
    TaskComplexity.COMPLEX: ModelConfig(
        model_id="claude-opus-4-8",
        cost_per_input_token=0.000015,
        cost_per_output_token=0.000075,
        max_context=200_000,
    ),
}


def classify_task(task_type: str, context_length: int, requires_reasoning: bool) -> TaskComplexity:
    SIMPLE_TASKS = {
        "intent_classification",
        "sentiment_analysis",
        "entity_extraction",
        "routing_decision",
        "template_filling",
        "data_validation",
        "language_detection",
    }

    COMPLEX_TASKS = {
        "code_architecture",
        "complex_reasoning",
        "novel_synthesis",
        "multi_constraint_optimization",
        "long_context_analysis",
    }

    if task_type in SIMPLE_TASKS and not requires_reasoning:
        return TaskComplexity.SIMPLE

    if task_type in COMPLEX_TASKS or requires_reasoning:
        if context_length > 100_000:
            return TaskComplexity.COMPLEX
        return TaskComplexity.COMPLEX

    return TaskComplexity.MODERATE


def get_model_for_task(
    task_type: str,
    context_length: int = 0,
    requires_reasoning: bool = False,
    force_tier: Optional[TaskComplexity] = None,
) -> ModelConfig:
    complexity = force_tier or classify_task(task_type, context_length, requires_reasoning)
    return MODEL_TIER[complexity]

Most teams could implement this in a day. The payback period is measured in weeks at moderate scale.

The companies getting this right

The teams that have figured out model tiering share one habit.

They built cost attribution before they built the routing. They know exactly what each feature costs per request. They can see in a dashboard that document summarisation costs $0.47 per request and intent classification costs $0.003.

Once those numbers are visible, the routing decision is obvious.

The teams still running everything on Opus do not have those numbers. They have a monthly bill that goes up. They have no visibility into which feature is driving the increase. They have no clear case for investing time in routing.

The invisible cost stays invisible. The bill keeps growing.

The gap between the teams with cost attribution and the teams without it is compounding every month in both directions.

The model you should actually be running on each task

Here is the practical version. Not a framework. A list.

Haiku 4.5: Intent classification. Sentiment analysis. Named entity extraction. Simple data transformation. Routing and triage. Language detection. Structured output generation from templates. Short text classification. Any task where the input is structured and the output is bounded.

Sonnet 4.6: Document summarisation under 50 pages. First-pass code review. Customer support drafts. RAG synthesis from retrieved chunks. Analysis with clear criteria. Most coding tasks that aren’t architecture decisions. The workhorse tier. This is where most features should live.

Opus 4.8: Complex multi-step reasoning. Long document analysis over 100 pages. Novel code architecture. Tasks where you genuinely cannot specify the criteria upfront because the model needs to figure out what matters. The tier for the 20% of tasks where frontier capability is actually necessary.

Run evaluations on your specific use cases before accepting any of these as final. The routing decision belongs in code based on measured quality, not in a blog post.

But the direction is clear. Most tasks running on Opus should be on Sonnet. Most tasks on Sonnet should be on Haiku. The model releases keep coming. The cost differences keep growing. The teams that route will compound an advantage that gets harder to close every quarter.

The next release changes this again

Anthropic ships on a cadence now. Opus 4.8 is not the last model. The next release will push the frontier further and drop the cost of everything below it.

Every model release is another reason to route. The Haiku of tomorrow will do things that Sonnet cannot today. Teams with routing infrastructure can take advantage of that automatically. Teams without it will continue paying frontier prices for tasks that no longer require frontier models.

The model you should be running today is almost certainly cheaper than the model you are running today.

Find out which one. The invoice you get next month will be the proof.