AI in Long-Lived Systems
Every AI integration is a bet that you can monitor something you do not fully control.
The conversation about AI in software systems is mostly about capability. What it can do. What it will do next.
The harder question is what happens when you depend on it. Not for a demo. Not for a prototype. For a system that needs to work reliably for years, maintained by people who did not build it, under conditions that shifted since the original design.
AI introduces a category of behavior that most engineering practices were not designed for: probabilistic output embedded in deterministic expectations. A function that returns a different result for the same input is not a function in the way most systems assume. When that behavior is buried inside a pipeline — making decisions, filtering data, generating content, routing requests — the system becomes harder to reason about in ways that do not show up in testing.
This is not a reason to avoid AI. It is a reason to treat it differently than other dependencies.
The first problem is observability. Most systems log what happened. AI components require logging enough context to reconstruct why. A model that rejects a transaction, summarizes a document, or classifies a support ticket is making a judgment. If you cannot inspect that judgment after the fact, you cannot debug it, audit it, or explain it to the applicant who was denied.
Traditional observability assumes deterministic behavior. You trace a request, you see the path it took, and you understand the outcome. With AI, the path includes a decision that may not be reproducible. The same input tomorrow might produce a different output. Logging the input and output is necessary but not sufficient. You need the model version, the prompt template version, the context window — logged alongside each call, not reconstructed after the fact. This tooling tends to arrive late.
The second problem is irreversibility. AI is increasingly used to make routing, filtering, and approval decisions that are difficult or impossible to undo. Approving or denying applications. Prioritizing work. Filtering what a user sees. Generating communications sent to real people. Each of these has downstream consequences that compound. A wrongly filtered email is not just a missed message — it is a missed message that influenced a decision that influenced a timeline no one can trace back to the filter.
The systems that handle this well treat AI decisions as proposals, not conclusions. A model suggests; a human confirms. A model classifies; a review queue catches edge cases. A model generates; a validation layer checks constraints. This is slower. It is also more durable. The speed advantage of AI is real, but it is not free. The cost is paid in the infrastructure required to keep the system accountable.
The third problem is containment. AI capabilities are general-purpose by nature, which makes them easy to spread across a system. A team integrates a language model for one use case. Another team sees it working and adopts it for a different use case. The model becomes a shared dependency with no clear owner, no consistent evaluation criteria, and no unified understanding of its failure modes. This is tool sprawl, but with a dependency that changes behavior when the provider updates it.
Containment means treating AI as a bounded component with explicit interfaces. What goes in. What comes out. What the acceptable range of behavior is. What happens when the behavior falls outside that range. These are the same questions you would ask about any critical dependency, but AI makes them harder to answer because the behavior is not specified in code. It is learned, and it shifts.
The pattern that holds up is the same one that holds up for most long-lived system decisions: make it observable, make it reversible, make it replaceable.
These constraints feel like they slow things down. They do. That is the point. The speed of integration is not the bottleneck in long-lived systems. The bottleneck is the speed of understanding — how quickly someone unfamiliar with the system can figure out what it does and why.
AI does not make systems fragile by itself. Systems become fragile when AI is embedded without the same discipline applied to every other critical component: clear boundaries, monitoring that captures why and not just what, and the assumption that it will eventually be wrong in a way no one predicted.
The question is not whether to use AI. It is whether you are building the infrastructure to live with it.
No spam, no sharing to third party. Only you and me.