Meta’s covert minor-perspective AI testing raises the bar for safety evals

In August 2025, Meta ran what now looks less like a routine QA exercise and more like a large-scale benchmark for the industry’s safety norms: more than 45,000 prompts sent to rival chatbots from fake accounts configured as under-18 users. The prompts covered self-harm, sex, drugs, eating disorders, and other crisis scenarios. Responses from ChatGPT, Gemini, and Character.AI were copied into spreadsheets and used in a testing program Meta internally called Cannes.

That matters because it changes the baseline for what “safety testing” can mean in practice. For technical teams, the important shift is not simply that Meta evaluated competitors. It is that the company appears to have used a minor-perspective interaction model at meaningful scale, turning adversarial probing into a structured evaluation pipeline. According to reporting cited by The Decoder, hundreds of contractors working through Meta’s vendor Covalen created fake accounts with birthdates under 18, then used those accounts to prompt the chatbots in ways designed to surface harmful or policy-sensitive behavior.

The workflow, as reported, was straightforward but operationally significant. Contractors impersonated minors, sent crisis-oriented prompts, and logged outputs into spreadsheets for later analysis. The batch size alone suggests this was not an isolated red-team exercise but a repeatable testing regime. The same reporting says the Cannes project remained active through at least April 2026, implying an ongoing process rather than a one-off audit.

Meta’s public line is important here. The company says it did not use the collected responses to train its own models. That distinction matters technically and legally. Safety evaluation data and training data serve different functions: one is used to assess model behavior, the other to fit parameters or improve model weights. But the absence of training use does not resolve the harder governance questions. What exactly happened to the collected data after it was copied into spreadsheets? Who had access? How long was it retained? Was it shared across teams or vendors? The available reporting does not answer those questions, and that uncertainty is central to the story.

This is where the practice collides with policy. Safety testing based on synthetic or adversarial personas is not new. What is new is the scale, the use of under-18 personas, and the fact that the target models belonged to other companies that were not told the testing was happening. Character.AI told WIRED the testing violated its terms of service, which underscores a straightforward vendor-risk issue: even if a company frames this as responsible evaluation, it may still run headlong into contractual and consent boundaries.

For privacy and governance teams, the child-persona detail is the sharpest edge. A fake minor account is not the same as collecting data from a real minor, but it still creates a compliance and ethics problem if it is used to elicit sensitive content from systems without the vendor’s knowledge or permission. The prompts themselves were designed to trigger responses around mental health crises and other protected or high-risk categories. That makes the handling of the resulting transcripts a serious data-governance concern, even if the data were never repurposed for training.

For product teams, the lesson is more operational than philosophical. Safety evaluation is becoming a competitive discipline, and it is starting to resemble a cross between red-teaming, compliance testing, and hostile interoperability analysis. If frontier companies can probe rivals at this scale, then safety standards may need clearer definitions around permissible test personas, disclosure requirements, data retention, and vendor notice. Without those rules, companies are left to rely on informal norms that do not travel well across vendors, contractors, or jurisdictions.

There is also a market implication. The episode may push AI vendors to disclose more about how they test for harmful outputs, not less. It may also intensify scrutiny of contractor workflows, because the operational distance between the company and the person generating the prompts can make accountability harder, not easier. For buyers of AI tooling, that raises a practical procurement question: what evidence can a vendor provide that its safety claims come from reproducible testing rather than opaque internal processes?

The broader competitive dynamic is clear enough. Meta’s move suggests that safety evaluation is now part of the frontier contest itself, not just a post-deployment obligation. But the same maneuver that makes the regime look more battle-tested also makes its governance weaker unless the industry develops common rules for how rival-model testing is done, who can do it, and what happens to the data it produces.

Meta’s covert minor-perspective test of rival chatbots raises the bar — and the questions — for AI safety eva…

AI News Desk

X moves to hosted MCP, shifting the integration burden from developers to the platform

AWS bets $1 billion on embedded AI engineering, not just AI software

Meituan’s LongCat-2.0 and the new reality of domestic AI training