Databricks Ontology: Who Controls Enterprise AI Agents

Databricks bets on ontology and reveals who controls the brain of enterprise AI agents

The history of enterprise artificial intelligence can be measured in layers. First came vector databases, which enabled semantic similarity searches across large volumes of text. Then came retrieval-augmented generation — RAG, as it is known by its acronym — which combined language models with external knowledge sources to reduce hallucinations. That architecture dominated the last two years and became the de facto standard for building corporate assistants.

Now Databricks is betting that architecture is not enough. At its annual Data + AI Summit conference, CEO Ali Ghodsi presented Genie Ontology, a context layer that automatically extracts business definitions from internal data, dashboards, SQL queries, documents, pipelines, and applications, and organizes them into a living graph that AI agents can consult to understand how an organization operates. The product is in preview phase and uses a ranking system inspired by Google's PageRank to determine which source deserves the most authority: who created the information, how much it is used, whether it is linked to certified assets, and when it was last updated.

The move is not purely technical. It is a declaration of intent about who will control the semantic infrastructure of the future enterprise, and that dispute has first-order economic consequences.

From archive to authority

The problem that Genie Ontology attempts to solve is not new. In any medium-sized or large company, the definition of "monthly recurring revenue" can differ between finance, sales, and the data team. Three departments, three different numbers for the same metric. Traditional RAG systems do not solve that: they retrieve what appears similar to the question, but they do not distinguish between an official definition and one that someone wrote in a Google document three years ago.

An ontology, on the other hand, does not merely retrieve; it encodes hierarchical relationships between concepts, establishes which source holds authority over which definition, and allows different AI agents to share the same business vocabulary. Michael Leone, an analyst at Moor Insights & Strategy, describes it with clarity: a single definition feeding all agents means you stop receiving three different answers to the same question. The operational value of that consistency, in organizations where critical decisions are made based on automated reports, is high.

Ashish Chaturvedi, a researcher at HFS Research, goes further and links this to the most persistent obstacle to corporate AI adoption: the lack of trust. According to his analysis, the central problem is not technical but one of knowledge governance. Decision-makers do not act on AI outputs because they cannot trace where they come from or verify whether the reasoning chain used the correct sources. An ontology anchored in official definitions with traceability back to the source directly addresses that deficit.

Databricks also integrates Genie Ontology with its Unity Catalog Semantics platform, which allows organizations to upload their own definitions or corporate vocabularies and maintain control over what enters the graph. Internally, the company reports having generated around 4.5 million ontological fragments during its own testing process. That gives an idea of the scale of the problem they are attempting to solve and, at the same time, of the complexity of keeping it up to date.

The risk that the narrative of progress omits

Every architecture has its limits. Stephanie Walter, of HyperFRAME Research, identifies the missing link with precision: verification. An ontology improves the context in which an agent operates, but it does not guarantee that the answer is correct. An agent can consult the correct definition and still apply flawed logic, omit rows in a dataset, misinterpret a workflow, or take an unintended action. Semantic consistency is not the same as operational correctness.

That distinction matters especially because the horizon Databricks is targeting is not query assistants but agents that execute actions: modifying pipelines, generating regulatory reports, triggering alerts, or making automated decisions in business processes. In that context, a well-grounded semantic error can be more dangerous than an obvious ambiguity, because it travels further before anyone detects it.

Leone adds another dimension: most companies do not have the data maturity and governance that implementing an ontology layer with rigor requires. If data lineage is weak, metric owners are not defined, or the current definitions are contradictory, adding an ontology does not solve the problem — it accelerates it. The graph feeds on existing sources, and if those sources are inconsistent, the inconsistency propagates with greater speed and the appearance of authority.

Walter adds the quietest dimension of risk: maintenance. An ontology is not a project that is configured once. It is a living asset that needs to be updated every time the business changes, every time a new product is launched, every time a metric is redefined or a unit is reorganized. Without update processes, clear ownership, and mechanisms for resolving conflicts between definitions, the graph becomes obsolete. And an obsolete ontology with algorithmic authority over agents is, according to Walter, "another stalled metadata project with a more sophisticated name."

That does not invalidate Databricks' bet, but it does define the terrain on which the product will have to demonstrate its value: not in a presentation on a stage, but in the operational maintenance within organizations with imperfect data and governance structures that are still maturing.

The dispute over the enterprise control plane

Genie Ontology does not exist in a vacuum. Snowflake has Horizon Context, its own semantic layer for agents. Microsoft is building equivalent capabilities within Copilot, Fabric, and its IQ family — Work IQ, Fabric IQ, Foundry IQ — integrating business context and governance into its broader infrastructure. The problem, Leone notes, is that each vendor has branded a basically similar idea with a different name, and that terminological fragmentation slows adoption because CIO teams cannot clearly compare what they are evaluating.

Beyond the names, what is in dispute is structurally significant. Chaturvedi describes it as the race to become the enterprise AI control plane: the place where data, governance, semantics, and agent execution converge. The historical analogy he uses is precise: ERP systems became the system of record for business transactions; data warehouses became the system of record for analytics. Now the question being decided is which platform becomes the system of record for AI agents.

Databricks is positioning Genie Ontology within a broader architecture that includes LTAP — its proposed foundation for agentic applications — and OpenSharing, designed to reduce integration costs in corporate AI environments. Connected together, these components point toward a vision that Ghodsi himself describes as an "agentic system of record": an authoritative source from which agents read, reason, and act. It is not an isolated product; it is a platform strategy.

The structural advantage of data providers in this race is real: they already own the data, governance controls, lineage, and permissions that agents need to operate safely. That puts them in a different position from a model provider or an orchestration tooling vendor. But that advantage has a less favorable side: it also makes them dependent on their customers already having their data in order. And for most companies, that is still not the case.

Chaturvedi offers a heuristic that simplifies the decision for teams currently evaluating these options: the context layer follows the gravity of the data. If the data lives in Databricks, Genie Ontology is the natural path. If it is in Snowflake, Horizon Context is. If the infrastructure is predominantly Microsoft, the IQ family is the route. Bhupendra Chopra, from the consulting firm Kanerika, reinforces that argument: above the marketing of each platform, the real decision is made by where the data already resides.

Snowflake is attempting to differentiate its offering by betting on open semantic interoperability, which in theory allows business definitions to move between platforms without becoming trapped in a single vendor's data model. That bet directly targets the risk of semantic lock-in — the equivalent of platform lock-in, but applied to the corporate vocabulary — in environments where companies operate across multiple data systems simultaneously.

Value is captured where execution is verified

The dominant narrative around these platforms speaks of context, consistency, and trust. All of those dimensions matter, but there is one that still has no solid answer in any of the available proposals: how to verify that what the agent did was the right thing.

That is the real frontier. Not the quality of the context with which the agent begins a task, but the ability to audit — with complete traceability — what the agent did, which definitions it used, which data it processed, what logic it applied, and whether the result is reproducible. Walter summarizes it without ambiguity: the next battleground in enterprise AI is not context, but verifiable execution.

That has direct consequences for where economic value is captured in this race. An ontology that improves semantic consistency is a valuable asset, but it is not sufficient for an organization to be able to delegate operational decisions with real consequences — financial, regulatory, operational — to autonomous agents. For that level of delegation to occur, the platform needs to offer something more: an auditable record of decisions, correction mechanisms for when the agent makes a mistake, and guarantees about what happens when the context changes and the graph has not yet been updated.

Databricks is building in that direction, although Genie Ontology alone does not yet answer that question. What the full set of announcements at the Data + AI Summit reveals is a coherent strategy toward that objective: data + governance + semantics + agentic execution as integrated layers within a single platform. The coherence of the vision is clear. The stress test will come when the ontology has to remain accurate within organizations that change faster than any graph can update itself.

That tension between the ambition of the architecture and the operational reality of the companies that will adopt it is where it will be decided whether this bet generates sustainable value — or whether it becomes sophisticated infrastructure built on foundations that are not yet ready to support it.