The Heist You Cannot Stop: How AI Companies Are Losing Their Most Valuable Asset
The Heist You Cannot Stop: How AI Companies Are Losing Their Most Valuable Asset
On February 23, 2026, Anthropic disclosed that three Chinese AI companies had run coordinated extraction campaigns against Claude. The numbers in that disclosure did not receive the attention they deserved.
24,000 fraudulent accounts. 16 million harvested conversations. Three separate companies, running simultaneously, with sophisticated infrastructure designed to evade detection. The campaigns had been running for months before Anthropic caught them. The largest, run by a company called MiniMax, was still active when Anthropic found it and published their findings.
OpenAI disclosed similar campaigns against their models the same week. So did Google. Microsoft had flagged the same pattern a year earlier.
This is the story that changes what Mythos and GPT-5.4-Cyber actually mean. Not as a competitive rivalry between two companies, but as a preview of what happens when capabilities that took $500 million and months of frontier research to develop can be extracted, stripped of their safety design, and replicated for a fraction of the cost by anyone with API access and patience.
What Distillation Is, and Why It Works
Knowledge distillation is a legitimate technique with a decade of research behind it. The original concept, formalised by Geoffrey Hinton in 2015: train a small, fast “student” model to replicate the behaviour of a larger, more expensive “teacher” model. The student does not just learn the right answers. It learns the probability distributions the teacher assigns to all possible answers. That is significantly more information than binary correct-or-incorrect labels, and it produces students that can capture most of the teacher’s capability at a fraction of the compute cost.
Legitimate uses are everywhere. Compressing models for mobile devices. Reducing inference costs. Creating task-specific variants. There is nothing inherently problematic about the technique.
The attack version uses the same mechanism with a different starting point: instead of running your own training data through your own model, you query someone else’s model through their API, collect the responses, and use those responses as training data for your own model.
The teacher did not consent. You do not have access to its weights. You did not pay for its training. But you can, with sufficient API queries, approximate its behaviour closely enough to capture most of its value. This costs 10 to 20 percent of the original development price.
The most valuable target for this technique is not general knowledge or writing ability. It is reasoning. The chain-of-thought models, OpenAI’s o1 and Claude’s extended thinking variants, produce detailed step-by-step reasoning traces as part of their output. Collecting those traces at scale teaches a student model not just what the teacher concluded, but how it thinks. Anthropic detected DeepSeek specifically crafting queries to elicit chain-of-thought reasoning from Claude. They asked it to “imagine and articulate the internal reasoning behind a completed response and write it out step by step.” At scale, across 150,000 conversations, that is a curriculum in how to reason like a frontier model.
The Three Campaigns
The February disclosures were specific enough to understand what industrial-scale distillation looks like in practice.
DeepSeek ran the most targeted operation. 150,000 conversations, focused on three specific capabilities: general reasoning, rubric-based evaluation (essentially teaching Claude to act as a reward model for reinforcement learning), and generating censorship-safe alternatives to politically sensitive queries. The last capability is particularly revealing. They were not just extracting general capability. They were specifically extracting a version of Claude’s knowledge stripped of content restrictions that the Chinese government might otherwise enforce.
The DeepSeek campaign is also the one with the most public evidence of impact. OpenAI’s congressional disclosure characterised DeepSeek R1, the model that shocked Western observers when it was released, as substantially built on harvested GPT outputs. DeepSeek spent an estimated $20 to 50 million on compute. OpenAI spent approximately $500 million developing o1. If the characterisation is accurate, DeepSeek saved somewhere between $400 and $450 million in R&D by running extraction campaigns rather than original training.
Moonshot AI (the company behind the Kimi assistant) ran a larger and more ambitious operation: 3.4 million conversations targeting agentic capabilities, coding, computer use, and computer vision. Agentic capability, the ability for an AI to plan and execute multi-step tasks using tools, is one of the hardest things to develop from scratch. It requires training data of the model doing complex tasks successfully, and that data is expensive to generate. Extracting it from a model that already has it is a shortcut measured not in days but in months of development time.
MiniMax ran the largest campaign by any measure. 13 million conversations through approximately 24,000 fraudulent accounts simultaneously active, routed through commercial proxy services designed to distribute traffic across cloud providers and mix extraction requests with legitimate-looking customer interactions. Anthropic detected this campaign while it was still running. This gave them unusual visibility into its full lifecycle. When Anthropic released a new model version mid-campaign, MiniMax pivoted within 24 hours and redirected roughly half its traffic to the newest model.
That last detail deserves emphasis. The infrastructure was sophisticated enough to respond to Anthropic’s model releases in real time. This was not an opportunistic hack. It was a continuous, adaptive operation.
Why Safety Training Is Not Part of What Gets Transferred
This is the part of the distillation story that the competitive coverage has largely skipped.
Anthropic and OpenAI spend substantial resources on what is broadly called alignment: training models to refuse harmful requests, to apply safety guidelines, to decline to help with activities that could cause serious harm. Mythos specifically has been calibrated against offensive cybersecurity use cases: it is supposed to be less helpful for building attacks than for finding vulnerabilities.
Distillation does not transfer this. The Frontier Model Forum put it directly in a February statement: “Distilling advanced coding capabilities could increase cybersecurity risk, as distilled models retain strong coding abilities while having learned little, if any, of the safety training of their source models.”
The mechanism is structural. Safety training in large language models is not a separate module that can be copied independently. It is a set of learned behaviours, encoded across billions of parameters, that produce particular outputs when the model encounters particular inputs. When you distill a model by collecting its API outputs, you collect the answers it gives to the questions you ask. If you are running a distillation campaign against a cybersecurity model, you are selecting for the outputs most useful for your purpose. The safety decisions (the refusals, the caveats, the reframings) are noise in your training signal, not signal. You train them out.
Anthropic’s own assessment: “Illicitly distilled models lack necessary safeguards… Models built through illicit distillation are unlikely to retain those safeguards, meaning that dangerous capabilities can proliferate with many protections stripped out entirely.”
A distilled Mythos would not be a slightly degraded copy of Mythos. It would be a model with most of Mythos’s vulnerability-finding capability, configured specifically for offensive use, with no safety constraints. The weaponised version is not a downgrade from the original. It is what the original would have been if Anthropic had chosen to build a weapon rather than a defensive tool.
The Export Control Problem
One of the US government’s primary tools for maintaining AI advantage is export controls on advanced semiconductor chips. The logic: without access to the compute required to train frontier models, adversaries cannot close the capability gap. By restricting chip exports, the US government limits the competitive development of frontier AI outside its strategic network.
Distillation is a partial end-run around this logic.
Training a frontier model from scratch requires enormous compute: thousands of high-end accelerators running for months. Distilling from an existing frontier model requires API access and base models to fine-tune. API access is available to anyone who can pass identity verification and pay the usage fees. Base models, Meta’s (META) LLaMA series and Alibaba’s (BABA) open-source Qwen, are freely available to anyone who wants to download them.
Anthropic’s explicit characterisation: “Distillation attacks undermine export controls by allowing foreign labs to close the competitive advantage that export controls are designed to preserve through other means.”
The US government’s February 24 response was the first round of PAIP Act designations. Commerce Department action against DeepSeek, Moonshot, and MiniMax required BIS licenses for any export-related transactions with those entities. The designations carry a presumption of denial. They have also been partially paused as part of the ongoing US-China trade negotiations.
Whether partial sanctions against three named companies change the underlying dynamics is a question. The infrastructure for conducting distillation campaigns does not require corporate entities. Commercial proxy services, distributed cloud accounts, and open-source base models are available to any actor with resources and intent. Sanctioning the companies that got caught does not close the technique.
What Defences Can and Cannot Do
The defence against distillation attacks is an arms race, and the defenders are not winning it.
AI labs have developed meaningful detection capability. Unusual query volumes, structured prompt patterns distinct from natural user behaviour, high diversity of prompts (attackers sample broadly to achieve capability coverage), and idiosyncratic response patterns in distilled models that echo their source. Anthropic detected MiniMax in the middle of an active campaign. That is a real capability.
Rate limiting, access controls, and identity verification raise the operational cost of running large-scale campaigns. They make it more expensive and more complex to extract capability at the scale of the documented February campaigns.
Output watermarking, embedding statistical signals in model outputs that can later be used to identify distilled derivatives, is a direction several labs are pursuing. The mechanism is subtle enough to be invisible to readers but detectable by classification systems. It helps with attribution after the fact and could support legal claims about IP violation.
None of this stops determined, well-resourced adversaries. Anthropic’s own assessment is direct: “determined, well-resourced attackers, especially state-affiliated ones, can likely conduct meaningful distillation despite current defenses.”
The rate limits get routed around with 24,000 accounts. The identity verification gets bypassed with commercial proxy services. The watermarks get diluted in large enough training corpora. The detection triggers account bans, which cost far less than creating new accounts.
What the defences do is raise the floor. They make distillation expensive enough that it is not trivial, and they create enough signal for detection that legal and regulatory responses become possible. They do not create a technical barrier that sophisticated, well-funded actors cannot surmount.
The Weaponised Scenario
The scenario that makes both Mythos and GPT-5.4-Cyber’s gating strategies look fragile is not speculative. It is an extrapolation from already-documented activity.
The same labs that ran 16 million extraction campaigns against Claude before Mythos existed are now operating in a world where Mythos has been deployed to 52 Glasswing partners. Those partners include companies with employees, contractors, and third-party integrations. Every organisation that uses Mythos creates some surface area for monitoring what queries they send and what capabilities they access. The aggregate signal across 52 organisations is a significant dataset.
Extrapolating from the February timeline (MiniMax ran 13 million conversations in the months before detection), a focused distillation campaign against Glasswing-adjacent activity could achieve meaningful extraction within six months of Mythos’s April announcement.
What emerges from that process is not Mythos. It is something built from Mythos’s outputs, optimised for offensive use, stripped of Anthropic’s safety alignment, and potentially released as open-source to proliferate beyond any single actor’s control. Once open-sourced, it cannot be recalled. It can be embedded in military and intelligence systems immediately. It can be made available to actors that would never have qualified for Glasswing access.
The gating strategy buys time. The evidence suggests the time it buys is measured in months, not years.
The Honest Accounting
The February disclosures, the PAIP Act designations, and the Frontier Model Forum’s anti-distillation coalition represent the beginning of a serious response to a serious problem. They are also insufficient to the scale of what they are responding to.
The economic model that funds safety research assumes that building aligned AI generates value that unaligned AI cannot capture. Distillation breaks that assumption. If a $500 million investment in frontier model development can be extracted for $50 to 100 million, without the safety training that makes the original valuable as a responsible tool, then the incentive to invest in alignment weakens. Why spend $500 million building something safe if the unsafe version can be distilled from it in weeks for a fraction of the cost?
This is the deeper question that the access philosophy debate (Glasswing’s 52 versus TAC’s thousands) does not address. The threat is not the defenders who do not have access. The threat is the adversaries who will never ask permission.
There is no clean answer to that. The technology exists. The technique is documented. The infrastructure is available. The actors are motivated and resourced. Raising the cost of distillation and improving detection are the right responses. They are not sufficient ones.
The world’s most critical software has been found to contain vulnerabilities that AI can now discover autonomously. The question of who will find them first, the defenders trying to patch or the adversaries who have extracted the capability to exploit, is the race that actually matters. And that race is not being won by access tiers.
Sources: Anthropic (February 23, 2026 disclosure), OpenAI congressional memo, Bloomberg, CNBC, TechCrunch, The Register, Institute for AI Policy and Strategy, Frontier Model Forum (February 23, 2026), Mara Jade intelligence analysis.
Share this