DevOps
← Topics · 28 articles
- Jun 11, 2026 7 min readAnthropic Just Quietly Rearranged the Entire AI Model Market. Most Teams Missed It.Claude Opus 4.8 dropped last week. Most teams are still paying frontier prices for tasks a Haiku handles in its sleep. This is a five-figure mistake happening silently across the industry right now.
- Jun 10, 2026 10 min readThe Code Nobody Dares to TouchEvery codebase has it. The file, the service, the function that everyone knows about and nobody goes near. It works. Probably. And that is the most dangerous sentence in software engineering.
- Jun 9, 2026 10 min readThe Reliability Contract Nobody SignedWhen software stops working, users feel betrayed. The team feels unfairly blamed. Both reactions are understandable and both miss what is actually happening between people and the software they depend on.
- Jun 7, 2026 11 min readSoftware Doesn't Have to Get SlowerEvery engineer has watched a system slow down over time despite nobody intending it. The causes are not mysterious. The prevention is not complicated. What is missing is the habit of treating performance as something that requires continuous attention rather than occasional rescue.
- Jun 6, 2026 11 min readRate Limiting Is Not a FeatureEvery team adds rate limiting eventually. Most add it after the incident that made it obvious they needed it. The interesting question is not whether to rate limit but what you are actually protecting, which most implementations get wrong.
- Jun 3, 2026 10 min readNobody Knows What Their System CostsMost engineering teams are spending serious money on cloud infrastructure and have only a vague idea where it is going. The bill arrives. It gets paid. Nobody asks hard questions until the number becomes impossible to ignore.
- Jun 2, 2026 10 min readThe Open Source Debt Nobody Talks AboutEvery production system is built on open source software maintained by people working for free. Most companies have never thought seriously about what happens when those people stop. Some of them are about to find out.
- Jun 2, 2026 9 min readThe Agile Sprint That Never EndsAgile promised to make software development more human. For a lot of teams it has done the opposite. Here is what went wrong and why the calendar is not the problem.
- May 31, 2026 10 min readTechnical Debt Is a Management ProblemEngineers talk about technical debt as if it is a technical phenomenon. It is not. It accumulates through decisions made by people with authority over engineering time, and it is resolved the same way.
- May 30, 2026 11 min readThe On-Call Rotation That Breaks PeopleOn-call burnout is treated as a scheduling problem. It is not. It is a systems engineering problem. The rotation is the last place to look. Everything upstream of it is where the damage is actually done.
- May 29, 2026 12 min readThe Data Pipeline Is Lying to YouBad data in production ML systems almost never announces itself. It arrives quietly, passes validation, and corrupts months of decisions before anyone realises something is wrong. Here is where the lies hide and how to catch them.
- May 28, 2026 12 min readYour Model Is Not the ProblemMost ML failures in production are not model failures. They are data failures, pipeline failures, and monitoring failures that teams misattribute to the model because the model is the part they understand least and fear most.
- May 25, 2026 10 min readThe Infrastructure That Nobody OwnsThe most dangerous systems in any engineering organisation are not the ones that are broken. They are the ones that are working, that everyone depends on, and that nobody is responsible for.
- May 24, 2026 11 min readEvent-Driven Architecture: An Honest AssessmentEvent-driven systems are elegant in talks and brutal in production. After building and operating them across multiple companies, here is what nobody tells you before you commit to the pattern.
- May 24, 2026 10 min readThe Standup That Became a Status ReportThe daily standup was invented to surface blockers and coordinate work. In most teams it has become a ritual performance of productivity. Here is how that happened, what it costs, and what the meeting was supposed to be.
- May 20, 2026 11 min readMicroservices Were Never About TechnologyEvery failed microservices adoption I have seen made the same mistake: treating microservices as an infrastructure pattern instead of an organisational one. The technology is the easy part. The hard part is everything else.
- May 20, 2026 11 min readThe GPU Is the New DatabaseTwenty years ago, teams had no idea how to run databases at scale. They made every mistake possible before the patterns solidified. We are now in the same position with GPU infrastructure, making the same mistakes, faster.
- May 19, 2026 11 min readUnit Tests Are Overrated and You Know ItWe test the wrong things obsessively and the right things barely at all. The unit test orthodoxy has produced codebases with 90% coverage that break constantly in production. It's time to say this out loud.
- May 18, 2026 10 min readThe Code That Runs at 3amThere are two kinds of code. The kind you write in the daytime, caffeinated, with full context, and tests passing. And the kind that runs at 3am, in production, when everything is on fire and you wrote it six months ago. Most people only think about the first kind.
- May 16, 2026 10 min readThe Post-Mortem That Changes NothingEvery serious engineering team runs post-mortems. Almost none of them work. The problem isn't the format or the facilitation — it's that most post-mortems are designed to produce closure rather than change.
- May 15, 2026 11 min readAPI Decisions You Can't Take BackMost code can be refactored. APIs are different. The decisions you make when you first expose an interface become the constraints everything downstream is built on. Here's which ones actually matter.
- May 14, 2026 9 min readYour Observability Is Looking at the Wrong ThingsA passing dashboard and a healthy system are not the same thing, and most teams only find out the hard way.
- May 13, 2026 10 min readThe Meeting That Should Have Been a DeployEngineering teams don't slow down because they run out of ideas or lose good people. They slow down because the path from decision to production gets longer every month. Here's how that happens and what it actually costs.
- May 12, 2026 10 min readThe Cost of Keeping Options OpenFlexibility is not free. Every abstraction you add to avoid being locked in has a price, and most teams are paying it without realising what they bought.
- May 11, 2026 11 min readObservability Is Not LoggingMost teams think they have observability because they have logs. They don't. Here's what observability actually means, why the distinction matters, and what it costs you when production breaks and you're flying blind.
- May 10, 2026 11 min readEveryone Is Writing Terraform. Almost Nobody Is Writing It Well.Infrastructure as code promised to make infrastructure reproducible, auditable, and safe. Most Terraform codebases I've seen deliver none of those things. Here's what goes wrong and why.
- May 9, 2026 10 min readKubernetes Is Not Your First ProblemEvery week I talk to a team running three services and forty users on Kubernetes. They're solving tomorrow's scaling problem while today's reliability problems go unfixed. Here's what that costs.
- May 6, 2026 11 min readYour CI Pipeline Is Lying to YouGreen builds don't mean working software. Most pipelines are optimised to pass, not to catch failures. Here's what a pipeline that actually tells the truth looks like.