Google Redesigned Its Data Architecture So AI Stops Failing in Enterprises

Google Redesigned Its Data Architecture So AI Stops Failing in Enterprises

For years, data teams and AI teams in large corporations operated like departments from different countries. The former built warehouses, catalogs, and pipelines. The latter deployed models, APIs, and agents. The result was predictable: AI agents reached the production environment and collapsed when faced with data that nobody had prepared for an autonomous machine to read, interpret, and act upon.

Simón ArceSimón ArceApril 30, 20267 min
Share

Google Redesigned Its Data Architecture So That AI Stops Failing Inside Enterprises

For years, data teams and AI teams in large corporations operated like departments from different countries. The former built warehouses, catalogs, and pipelines. The latter deployed models, APIs, and agents. Both worlds communicated through manual exports, discontinuous processes, and a blind faith that "the other team will handle it." The result was predictable: AI agents reached the production environment and collapsed when confronted with data that nobody had prepared for an autonomous machine to read, interpret, and act upon.

At Google Cloud Next 2026, Google named that collapse with precision: the separation between the data platform and the AI platform is the single greatest obstacle to the enterprise deployment of autonomous agents. Its response was the Agentic Data Cloud, a deep reconfiguration of its data architecture that does not add an AI layer on top of what already exists, but rather redesigns the foundations so that agents become first-class users of enterprise data.

The difference in ambition is not trivial. We are not talking about new connectors or dashboards enriched with natural language. We are talking about a structural redesign that forces every Fortune 500 company — with data distributed across AWS, Azure, and Google Cloud — to rethink how it is going to govern, serve, and monetize the information it already possesses.

The Diagnosis That Executives Prefer to Ignore

There is a figure that makes people uncomfortable: according to the research accompanying the launch, around 70% of companies discover the failures of their data infrastructure after deploying agents, not before. This is not a technical problem. It is a leadership problem wearing a technical disguise.

Fragmented data, ungoverned, trapped in silos across different clouds, did not appear overnight. It accumulated over years of hasty decisions, poorly integrated corporate acquisitions, and a very human organizational tendency: deferring the difficult conversation about the real data architecture because "the business keeps running." Until it stops running.

The architecture Google presented is composed of six components that are not independent of one another, but rather form a system with sequential logic. At the base, the Multicloud Data Lakehouse, built on the open Apache Iceberg format, allows BigQuery to query data stored in AWS S3 and Azure ADLS without the need to move or replicate it, eliminating egress costs and the risk of incoherence between copies. Operating on top of that foundation is the Lightning Engine for Apache Spark, a vectorized execution layer written in C++ that delivers up to 4.9 times the performance of conventional Spark. The data is not only accessible; it is processable at a speed that makes it viable for an agent to generate, execute, and correct Spark code in continuous cycles without costs spiraling out of control.

On top of that execution infrastructure comes the contextual intelligence layer: the Knowledge Catalog, the evolution of Dataplex Universal Catalog announced on April 10, 2026. This piece is the one that should command the most attention from enterprise architects. The catalog does not require data teams to manually catalog assets. It examines query logs, profiles tables, analyzes semantic models from tools like Looker, and extracts relationships between entities from unstructured files. The result is a dynamic knowledge graph, maintained automatically, that answers the question any agent needs to resolve before acting: what data exists, what it means precisely, and whether it is trustworthy.

When Storage Stops Being Passive

The piece that most radically changes the operational geometry of data is Intelligent Storage, currently in preview. Until now, a file that entered a Google Cloud Storage bucket was inert until someone decided to process it. With this functionality, the moment a file arrives in the bucket, the system automatically tags it, generates embeddings, extracts relevant entities, and links it to the Knowledge Catalog. PDFs, contracts, support tickets, audio recordings: everything is converted into a searchable asset without any engineer intervening.

For executives who have been deferring unstructured data preparation projects — those that "will take six months" of extraction, OCR, indexing, and cataloging — this reconfigures the time-and-cost equation in a way that does not allow for comfortable postponement. What was previously a project with an executive sponsor, its own budget, and an uncertain delivery date becomes an automatic consequence of storage policy.

The Deep Research Agent, based on Gemini 3.1 Pro, illustrates the terminal use case of this entire infrastructure. It operates by combining internal sources from the Knowledge Catalog and the Lakehouse with open sources on the internet, generates structured research plans, and delivers reports with verifiable citations in minutes. Tasks that in fields such as competitive intelligence, life sciences, or financial services used to consume between one and three weeks of analyst work become the starting point, not the finishing line.

The Data Agents Kit completes the picture from the developer's side. It offers preconfigured MCP tools and three specialized agents: one that converts natural language instructions into managed pipelines by choosing among BigQuery, dbt, Spark, or Airflow; another that automates the complete cycle of data science models; and a third dedicated to infrastructure observability. The Model Context Protocol acts as an interoperability layer that allows agents from any provider — Gemini, Claude, proprietary models — to access data assets without custom connectors.

Multicloud Stops Being a Complaint and Becomes an Architecture Decision

No company among the Fortune 500 operates exclusively on Google Cloud. SAP, Salesforce, Workday, and Oracle systems are distributed across AWS and Azure for historical, contractual, and operational reasons that no CTO mandate can resolve with a memo. For years, multicloud was the recurring argument for not advancing any AI initiative at scale: "first we need to consolidate the data."

The Multicloud Data Lakehouse dismantles that argument with technical specificity. Using the Iceberg REST Catalog, Multicloud Interconnect, and an intelligent cache layer, BigQuery can query data in AWS S3 and Azure ADLS with latency and cost comparable to those of native data in Google Cloud. A procurement agent can combine in a single query contract data stored in S3, inventory in Azure, and transactional records in BigQuery, all under a unified Iceberg catalog, without any engineering team having to manage an ETL process between clouds.

The implication for integration architects is strategic in nature. The conversation shifts from "how do we migrate everything to a single cloud" to "how do we govern a single catalog over the data distribution we already have." It is not the same conversation. The first carries a prohibitive political and financial cost in the majority of mature organizations. The second is executable without disrupting existing contracts with other providers.

What Google is proposing, taken as a whole, is a paradigm shift with organizational consequences that go far beyond the technical architecture. MCP as an agent governance layer demands being managed with the same discipline that is applied today to an API gateway: versioning, authentication, monitoring, usage limits. The Knowledge Catalog ceases to be a documentation project and becomes a real-time operational dependency, which implies service level agreements, continuous maintenance, and an operating model that data teams have not yet designed.

The culture of an organization is not the framed poster in the boardroom nor the CEO's speech at the annual convention. It is the accumulated sum of all the decisions that leaders made when it was more comfortable to defer than to decide, safer to delegate than to assume responsibility, easier to blame technical debt than to acknowledge that the data architecture reflects with surgical precision the architecture of power, of fear, and of the conversations that management never had the courage to sustain.

Share

You might also like