Why are so many companies adopting AI without a clear data strategy?

Generative AI tools like Microsoft 365 Copilot and Gemini were integrated directly into productivity platforms employees already used, bypassing formal IT approval processes. This led to widespread adoption without organizations fully understanding what data these tools could access or how it was being processed.

What data risks do AI assistants pose in corporate environments?

When AI assistants connect to an organization's systems, they can access emails, documents, chats, and internal files. If data permissions and access controls are not properly configured, sensitive information may be exposed to the AI model or inadvertently shared, creating significant security and compliance risks.

How can SMEs protect themselves when using AI productivity tools?

SMEs should conduct a data access audit before enabling any AI assistant, define clear permission policies, apply the principle of least privilege, and ensure employees receive training on what information should not be shared with AI tools. A formal AI governance policy is essential even for smaller organizations.

Is the risk with generative AI about the AI model itself or the organization's data?

The primary risk is not the language model itself but the data it encounters when connected to a real organization. Poorly managed internal data — with excessive permissions and lack of classification — becomes the main vulnerability when AI tools are granted broad access to corporate systems.

What steps should companies take before deploying AI assistants like Copilot or Gemini?

Companies should map and classify their sensitive data, review and restrict access permissions across shared drives and communication tools, establish an AI usage policy, involve legal and compliance teams, and monitor AI interactions for potential data exposure before and after deployment.

91% of Companies Share Data With AI Unknowingly

Why 91% of Companies Adopt AI Without Knowing What Data They Are Handing Over

Generative artificial intelligence arrived at most organizations not through the technology department, but through the back door of productivity applications. Microsoft 365 Copilot, Gemini, the assistants embedded in collaboration platforms: these tools were activated in corporate environments where employees were already working, and with that began a silent experiment whose terms no one had fully negotiated.

The problem does not lie in the language models. It lies in what those models find when they connect to a real organization.

According to Huble's report on data readiness for AI, only 8.6% of companies consider themselves fully ready to operate with artificial intelligence. The remaining 91% sits somewhere between experimentation and stagnation, despite having committed budget, time, and internal reputation to adoption projects. Deloitte, in its 2026 report on the state of AI in the enterprise, records that two-thirds of organizations report productivity gains, but it also documents persistent deficits in infrastructure, data management, talent, and risk control. The growth of employee access to AI tools was 50% in 2025. The readiness to manage that access did not grow at the same pace.

This gap is not accidental. It is structural. And it has a cause that few organizations are willing to name without euphemisms: corporate data is, for the most part, in disarray.

What the Assistant Finds When No One Is Watching

When a company activates an AI copilot within its productivity environment, that system does not create new access doors. It uses the ones that already exist. It operates with the inherited permissions of the user who activates it and reaches exactly where that user can reach, with one operational difference that changes everything: it does so at machine speed.

Microsoft documents this behavior with precision. Its Copilot architecture establishes that the system operates within the service perimeter, bounded by the authenticated user and the content that person is authorized to access. It does not break permissions. It executes them. And therein lies the point that many security teams had not calculated with sufficient clarity: if permissions are more open than they should be, a single prompt can retrieve what previously required dozens of scattered manual searches.

Years of shared folders that were never closed. Files copied for a one-time analysis that were left on personal drives. Sensitive emails with attachments archived without classification. Document repositories that accumulate records no one deletes because no one remembers they exist. That is the real raw material with which the AI assistant works when it connects to an organization that did not audit its environment before enabling access.

The risk does not originate from the language model. It originates from the data architecture that the model inherits.

Security teams face here a visibility problem that their traditional tools do not solve. Data loss prevention was designed to monitor exit points. Identity management systems administer roles and permissions. Activity logs document what has already occurred. None of these instruments was built to map what happens when an AI query crosses documents, mailboxes, databases, and knowledge repositories in a single interaction, generating a response that combines fragments of information that had never been connected before.

What emerges from that crossing can be perfectly legitimate. It can also be a concentration of sensitive data that no prior control had anticipated.

The Hidden Cost of Ignoring Infrastructure Before the Model

The dominant narrative around AI adoption in the enterprise has a foundational distortion: it places the conversation on models, interfaces, and use cases, and leaves in the background the question of what data feeds those decisions and under what conditions of order, classification, and governance.

Gartner estimates that 63% of organizations do not have the data management practices necessary to sustain AI projects. That number helps explain why so many deployments stall before reaching production — not because of model limitations or lack of budget, but because the underlying data infrastructure cannot support what the model needs in order to operate coherently.

The misalignment has direct financial consequences. Organizations that invest in licenses, training, and process change without first resolving the data layer are paying for capacity they cannot use reliably. Worse still: they are assuming exposure they cannot quantify. If AI systems operate on unclassified data, with excessive permissions and without an up-to-date inventory of what exists where, the window of regulatory exposure widens in ways that auditors and legal teams are still learning to measure.

Persistent Systems, among other vendors specializing in this field, structures its solutions around three precise axes: infrastructure optimization, data quality, and secure scaling of AI workloads. The sequence is not accidental. Scale comes last, not first.

Astutis documents in its 2026 report that the vast majority of workers expect AI to have a significant impact on their roles within five years, but only a small fraction actively uses it today. The reason is not cultural resistance. It is that the real experience with AI tools in poorly prepared corporate environments generates concrete friction: inconsistent responses, results that mix information from different contexts, uncertainty about whether what the system returns can be trusted. That friction is not resolved by improving the model. It is resolved by resolving the data.

Governing AI the Way You Govern a High-Risk Identity

There is a conceptual shift that the most advanced organizations in this field are already executing, and that others will eventually have to make: treating AI agents as governed identities, not as user tools.

When a copilot or an automation agent accesses corporate systems, it does so through service accounts, programming interfaces, and user contexts. It has permissions. It acts on data. It generates outputs that may contain sensitive information. For all those reasons, it should receive the same treatment as any high-privilege identity in an organization: periodic access reviews, application of least privilege, behavior monitoring, and traceability of what it touches.

Most corporate security programs are not configured for this. They were designed with people and systems in mind, not AI agents that operate with their own logic, combine sources of information, and produce outputs that their human operators cannot always anticipate.

Data readiness for AI, in its operational sense, requires at least four concrete actions. First, building an up-to-date inventory of the AI systems active in the environment, including copilotos embedded in productivity platforms, custom models, and automation agents, mapped to the data sources they access. Second, classifying sensitive data consistently across cloud storage, software-as-a-service applications, and legacy repositories, because without that classification, compliance controls cannot distinguish between sensitive and generic information. Third, applying to AI agents the same review applied to high-risk service accounts: their permissions should reflect actual use, not accumulated inheritance. Fourth, connecting that data context to existing controls — including data loss prevention systems, identity and access management, and access gateways — so that policies reflect real exposure rather than abstract patterns.

None of these steps requires waiting for AI models to improve. They are decisions about the infrastructure that already exists.

Data Readiness Is Not a Prior Stage — It Is the Real Wager

The enterprise AI market is growing at rates exceeding 30% per year and is projected to reach between 150 and 200 billion dollars by 2030. In that context, the competitive advantage will not lie in having adopted AI before everyone else, but in having adopted it on a foundation that allows operating with confidence and scaling without friction.

Organizations that treated data readiness as a minor technical formality are discovering, in production, that their AI systems produce inconsistent results, that their legal teams cannot certify regulatory compliance for AI-assisted processes, and that their security teams cannot answer basic questions about what information is being processed and by whom.

The displacement that this moment reveals is not technological at its core. It is one of governance. Artificial intelligence is forcing companies to confront data problems that already existed before any copilot was ever activated: unclassified data, permissions accumulated without review, incomplete inventories, controls designed for a world where searches were manual and slow. What changed is not that those problems appeared. What changed is that it is no longer possible to ignore them without visible and rapid consequences.

The organizations that will emerge best positioned in this cycle are those that understood that preparing the data is not a step prior to adopting AI. It is, precisely, the foundational work that determines whether adoption produces value or simply produces more risk surface over which a faster system operates.