White Circle Raises $11M to Monitor AI Safety

White Circle raised $11 million to monitor AI after no one else wanted to do it

One night in late 2024, Denis Shilov was watching a crime thriller when an experiment came to mind. He wrote a prompt that could make any artificial intelligence model ignore its own safety filters. The trick was conceptually simple: it told the model to stop behaving like a chatbot with rules and start acting like a software access point that simply responds to requests without evaluating whether it should. It worked on every leading model. The next day, his post on X had accumulated enough traction that Anthropic reached out and asked for private access to their systems.

What Shilov concluded from that episode was not that he had found a bug. It was that no company had a post-deployment control layer over what their AI models did once users began interacting with them. That observation became White Circle, and on May 12, 2026, the Paris-based startup announced a seed round of $11 million backed by figures who know the models from the inside: the head of developer experience at OpenAI, a co-founder of OpenAI now at Anthropic, the co-founder and chief scientist of Mistral, the co-founder and chief scientific officer of Hugging Face, the founder of Datadog, the creator of Keras, and executives from DeepMind and Sentry.

The capital is not the most interesting part of the story. What is interesting is what kind of business infrastructure justifies that level of conviction so early, and why the market's response to that specific problem took so long to appear.

The problem that AI labs have incentives not to fully solve

When a company deploys a language model in production, it inherits an implicit contract with the model provider: the provider has trained the model to behave in a certain way in general terms, and the company assumes that training is sufficient for its specific use cases. That assumption is becoming increasingly difficult to sustain.

Today's models are both instrument and risk at the same time. A customer support agent can promise a refund the company never authorized. A coding agent can install something on a virtual machine that was never supposed to be touched. A model integrated into a financial application can mishandle sensitive customer data. None of those scenarios are hypothetical; they are documented consequences of deploying capable models in environments with incomplete or ambiguous instructions.

The standard response from model laboratories is safety fine-tuning during training. But that fine-tuning is, by definition, generic. It is calibrated to prevent the model from explaining how to manufacture weapons or producing harmful content in the abstract. It is not calibrated for the specific policy of a financial services company regarding what can and cannot be promised in a conversation with a customer, nor for a healthcare company's restrictions on which data can be cross-referenced with which other data.

Shilov points to something more structural: laboratories charge for input and output tokens even when the model rejects a harmful request. That means they have a limited economic incentive to block abuse before it reaches the model. He also points to the so-called "alignment tax": training safer models tends to reduce their performance on tasks such as coding. That tension between safety and performance does not disappear with more funding; it is a technical constraint that laboratories manage, but do not eliminate.

White Circle is betting that this gap will not be closed from the training side alone. Its product is a real-time application layer that sits between a company's users and its models, reviewing inputs and outputs against that company's specific policies, and capable of blocking or flagging problematic behavior: hallucinations, data leakage, prohibited content, prompt injection, and destructive actions in software environments. The company says it has processed more than one billion API requests and has active customers in fintech, legal, and developer tooling, including Lovable. The system supports more than 150 languages and holds SOC 2 Type I and II certifications as well as HIPAA compliance.

What one billion requests validates and what it does not

One billion API requests is the kind of number that sounds large and can mean very different things depending on volume per customer, type of request, and retention rate. White Circle was founded in 2025 and has 20 employees, almost all of them engineers. That suggests an architecture designed to scale through infrastructure rather than service headcount, which is consistent with an API model that intercepts existing traffic.

What the number does validate, to the extent that public data allows any conclusion, is that the platform has operational traction, not merely public relations traction. There is an important difference between a company that announces funding with a list of prospective customers and one that arrives at the announcement with evidence of sustained use. The benchmark that White Circle published in May 2026, KillBench, also functions as a signal of technical maturity: they ran more than one million experiments across 15 models from OpenAI, Google, Anthropic, and xAI to measure bias in high-stakes decision scenarios. The results showed that models made different decisions based on attributes such as nationality, religion, or phone type, and that those biases worsened when responses were requested in structured formats intended to be read by software — which is exactly how most companies connect models to their production systems.

That finding has direct consequences for any company using AI in decisions with real-world outcomes. It is not an academic experiment; it is documentation of a risk vector that occurs in the most common integration format.

What the number does not yet validate is willingness to pay at scale. The business model of a control layer that intercepts traffic has a potentially powerful mechanic: if it becomes part of the workflow between users and models, it captures budget from multiple lines — security, compliance, content moderation, and model operations. But that also means it competes for budget against teams that already have observability tooling and that may resist adding yet another layer of infrastructure.

The geographical concentration of the team in Europe, with presence in London, France, and Amsterdam, suggests that expansion into the US market — where the largest enterprise technology budgets reside — requires sales infrastructure that 20 engineers cannot cover. The funding is likely earmarked for exactly that.

A control layer that models cannot sell on their own

White Circle's strongest argument is not technical. It is one of governance.

Shilov articulated it precisely: there is a structural trust problem in asking a model provider to judge the behavior of its own models. Anthropic cannot be a neutral arbiter of Claude's behavior when it is the same entity that trains it, commercializes it, and charges for every token it generates. That is not an accusation; it is a description of incentives. AI laboratories are companies with specific commercial interests, and their safety systems are calibrated to those interests, not to those of every company that deploys their models.

That separation is what makes backing from investors with experience inside the sector's most important laboratories strategically relevant beyond the capital itself. People who understand the technical and commercial constraints of OpenAI, Anthropic, Mistral, and DeepMind from the inside are betting that the post-deployment control problem will not be solved from within those laboratories with the depth that enterprises are going to require. That is both a validation of the problem and a signal about the direction of the market.

The transition from chatbots to autonomous agents makes that gap more urgent. A chatbot that responds poorly is a reputational problem. An agent that accesses files, executes code, browses the web, and takes actions on behalf of a user can create damage that cannot be undone with an apology message. The market for controlling autonomous agents is in its earliest stages, but the direction of AI spending points there with clarity.

White Circle arrived at this announcement with operational usage, published research, compliance certifications, and backing from individuals with technical credibility in the sector. That is no guarantee of success, but it is a starting line that is considerably further ahead than where startups typically find themselves at the seed stage. The next threshold that matters is not the next funding headline; it is how many companies in regulated industries decide they need a control layer between their users and their models before an incident forces them to find one the hard way.