Why 95% of Enterprise AI Projects Fail the Pilot

Why 95% of Enterprise AI Projects Don't Survive the Pilot

There is a difference between a demonstration that dazzles in a boardroom and a system that works from Monday to Friday without anyone having to rescue it. The artificial intelligence industry has spent two years building the first with a dexterity it has failed to transfer to the second. And the reason is not in the models, which are becoming increasingly powerful. It is in how it was decided to talk about them, and by extension, in how it was decided to build them.

The figure circulating among the most honest technical teams in the sector is hard to ignore: up to 95% of generative AI projects in enterprises fail to achieve measurable return on investment, according to the MIT NANDA Initiative, as cited by Iris.ai. A failure range of 70 to 95 percent is not a signal that the market "has yet to mature." It is a signal that something structural is broken in the way things are being built.

Enrique Dans, in a piece published on June 10, 2026 in Fast Company, points to where the fracture lies. Not in the technical capability of language models. Not in employee resistance. But in something more difficult to admit for an industry that lives by convincing investors: enterprise AI was built on metaphors rather than formal models. And metaphors, however useful they may be for selling, do not industrialise.

From Poetic Language to Architecture That Doesn't Scale

The inventory of metaphors that populated the AI discourse over the past two years is extensive and revealing. Systems "remember," "reflect," "plan," and, in the case of the "sleep" technique that Anthropic described for its agents, literally "sleep." The Azure OpenAI Assistants API documentation describes "threads" that store message history and truncate it when the context window is exhausted, presenting that as "memory." The Anthropic engineering team speaks of "long-running" agents that must "preserve continuity between sessions."

None of these descriptions is technically incorrect. The problem is that they are descriptive when they need to be formal. A metaphor describes. A model formalises. That difference has direct economic consequences.

When "memory" is not a data model but an operational analogy, there is no defined identity, no persistent state, no relationships with explicit permissioning, no constraints that the system guarantees regardless of who uses it or how many times. There are, in technical terms, no invariants: the rules that an architecture maintains regardless of external conditions. Without invariants, every implementation is a fresh negotiation. Every deployment requires someone to translate the company's operational reality into the language the system can process. And that translation cannot be delegated to a template.

The observable result is that the leading frontier AI providers, including OpenAI and Anthropic as described in Dans's piece, are sending engineers and field teams to their enterprise clients to map workflows, define constraints, and connect systems. What looks like a premium service is in reality a structural signal: the platform cannot do it alone. When customised translation becomes the dominant mode of delivery, the product ceases to be a platform and becomes consultancy with a technological interface.

The cost of that model for buyers is twofold. First, the direct expenditure on bespoke integration that must be repeated every time a system, a regulation, or an internal process changes. Second, the opportunity cost of being unable to scale: if every new application requires the same manual intervention, the marginal return of each additional implementation does not improve over time. The cost curve does not come down. The promise of the platform does not materialise.

The Historical Pattern That the AI Industry Has Yet to Cross

Dans connects the current moment of enterprise AI with three technological transitions that did manage to industrialise, and the comparison is uncomfortable for anyone who prefers to think of AI agents as a phenomenon without precedent.

Edgar F. Codd developed the relational model of data in the nineteen-seventies. Before that work, databases were proprietary implementations, each with its own language, its own storage logic, and its own form of access. After Codd, there was a formal abstraction: relations, attributes, keys, functional dependencies. From that formalisation arose SQL, and from SQL arose a multi-billion-dollar market in software, integrations, and services. What made that market possible was not that databases became more powerful. It was that they became describable with sufficient precision for two independent systems to understand each other without prior negotiation.

The web followed the same pattern. The W3C defined resources identified by URIs, a stateless protocol formalised in RFC 9110, and a shared grammar of HTTP methods, status codes, and HTML. No company invented the browser and then asked its customers to hire consultants to interpret what its pages meant. The grammar was public, formal, and precise enough for any developer to build upon it without calling anyone.

SAP did the same with business processes. Its dominance in ERP did not come from having better interfaces than the consultants of that era. It came from having formalised the enterprise as a technical object: master data, transactions, accounting logic, inventory, procurement, operational relationships. That formalisation made implementations sufficiently repeatable for templates, certified partners, extensions, and a robust secondary market to exist. The variance between one client and another was reduced enough that the accumulated knowledge from one implementation transferred value to the next.

What these three cases have in common is that the leap from capability to platform did not happen because the technology improved. It happened because someone defined with precision what the technology represented and under what rules it operated. In all three cases, there was a moment of formalisation that preceded the moment of scale.

Enterprise AI has not yet crossed that moment. It has the capability. What it lacks is the grammar.

What McKinsey Confirms and Most Teams Ignore

The MIT figures on failure are not the only evidence available. McKinsey's research on the state of AI, referenced in Dans's article, arrives at a conclusion that should unsettle teams measuring their progress by the number of pilots launched: the companies that obtain material benefits from AI are not the ones that use the most AI. They are the ones that redesigned their workflows.

That distinction is not semantic. Using AI on top of an existing process produces marginal gains at best. Redesigning the process around a formal representation of the work produces something different: a system in which artificial intelligence is not an accessory but a condition of the process's operation itself.

Michael Hammer wrote in the Harvard Business Review that companies make a predictable mistake when adopting new technology: they accelerate existing processes instead of replacing them. Dans revives that argument for the current moment. The contemporary version of Hammer's error is to take an approvals workflow designed for humans who read paper documents, add a language model that summarises those documents, and call it transformation. The process has the same causal structure. It simply has one faster component at an intermediate step.

The redesign that McKinsey detects in companies with measurable returns has a structural characteristic: there is a layer that defines what an entity in the business is, what states it can have, what transitions are valid, what permissions are required for each action, and what rules cannot be violated regardless of the instruction the system receives. That is not an elaborate prompt. It is what Dans calls the formal layer that the industry has yet to build in a standardised way.

The difference between having that layer and not having it is auditable. Without it, the system can give a different response to the same query depending on the history of the session, the user asking, or how the previous instruction was phrased. With it, there are invariants: the client's contract cannot be modified without authorisation from the regional manager, regardless of what the agent "understood" from the email it read. That guarantee does not come from the language model. It comes from the architecture that contains it.

For regulated sectors, this distinction is not a technical preference. In financial services, healthcare, or the public sector, the absence of verifiable invariants is not an operational inconvenience. It is a blocker for deployment at scale, because no legal team is going to sign off on liability for a system that cannot guarantee consistency in its decisions.

The Next Battle Is Not Between Models — It Is Between Abstractions

Dans's analysis ends with a projection worth taking seriously as a strategic signal: the competitive advantage in the next phase of enterprise AI will not be won by the providers with the most powerful models. It will be won by those who define the formal abstraction upon which everyone else builds.

That opens a question with concrete market consequences, even if the answer is not yet clear. The natural candidates for defining that abstraction are several, each with different incentives. The large cloud providers — Microsoft, Google, and Amazon — have the distribution and the enterprise relationships, but they also have the incentive to maintain the consultancy-intensive model that generates revenue from professional services. The model laboratories such as OpenAI and Anthropic have the technical depth, but they built their businesses around the capability of the models, not around the formalisation of the processes surrounding them. The established enterprise software companies — SAP, Salesforce, Oracle — already operate on formal layers of data and processes, but their speed of adaptation to new architectures has historically been slow.

The most interesting space could belong to a type of actor that does not yet have a clear name in the market: a specialist in knowledge infrastructure and workflow whose value proposition is not the language model but the layer that makes it operable within an enterprise without requiring manual translation in every implementation. Something analogous to what middleware was in the nineteen-nineties, but with the ability to reason about the rules it contains.

The signal that such an actor is winning will not be a product announcement. It will be the moment when two companies from different sectors can share an implementation without either of them having to call a consultant to explain what "approved" means in their organisation. When the grammar is precise enough for that to occur, the artisanal phase of enterprise AI will have ended. Until then, the 95 percent failure rate is not a statistical accident. It is the price of building on analogies instead of definitions.