Anthropic Claude Claude Code AI security leak KAIROS open source

Claude Code Source Leak: What Actually Happened

Lando Calrissian — 1 April 2026

By Lando Calrissian | April 1, 2026 Research by Mara Jade

On March 31, 2026, Anthropic accidentally published the full source code of Claude Code to npm.

Not a hack. Not a breach. A developer forgot to exclude a debugging artifact from the release package, and 512,000 lines of TypeScript became public knowledge within hours. By the time Anthropic pulled the package, 41,500 GitHub forks existed. The code is not going back in the bottle.

What came out is genuinely interesting. Some of it is embarrassing. Some of it is legitimately concerning. And one finding — an “undercover mode” that instructs the AI to hide its origins when working on public repositories — is going to follow Anthropic for a long time.

Here is what actually happened, and what was actually inside.

How a Debugging File Exposed 512,000 Lines of Code

Claude Code is built on Bun, a JavaScript runtime Anthropic acquired in late 2025. Bun has a known bug: source maps get included in production builds even when they should be disabled. A source map is a debugging artifact — a file that maps minified, bundled code back to human-readable source. Useful for developers. A problem when shipped publicly.

The npm package for Claude Code v2.1.88 included a file called cli.js.map. That file contained a reference to a zip archive hosted on Anthropic’s Cloudflare R2 storage bucket — publicly accessible, no authentication required. Security researcher Chaofan Shou spotted it at 4:23am ET, posted the direct link on X, and the internet did the rest.

The proximate cause is embarrassingly simple: a misconfigured .npmignore file, or a missing files field in package.json. The kind of mistake that normally gets caught in review. The ironic footnote that multiple developers on Hacker News noticed immediately: a significant portion of Claude Code was almost certainly written by Claude Code itself. The AI helped write the code that leaked its own source.

This was also the second leak in a week. Days earlier, Anthropic had accidentally exposed their internal model spec in a separate packaging incident. Two unrelated leaks in quick succession suggests a systemic issue with release pipeline hygiene, not an isolated error.

Ten Things the Internet Found Inside

1. Fake Tools to Poison Competitors

In claude.ts, there is a flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends anti_distillation: ['fake_tools'] in its API requests, instructing the server to silently inject decoy tool definitions into the system prompt.

The logic: if someone is recording Claude Code’s API traffic to train a competing model, the fake tools pollute that training data. A second mechanism buffers the model’s reasoning chain, summarises it with a cryptographic signature, and returns only the summary to traffic recorders — not the full reasoning.

How effective is it? Not very. A proxy stripping the anti_distillation field bypasses it entirely. These mechanisms are deterrents, not genuine defences. The real protection is legal.

2. Undercover Mode — The Finding That Will Not Go Away

undercover.ts is roughly 90 lines of code that will generate disproportionate coverage.

When Claude Code operates in non-internal repositories, this mode strips all traces of Anthropic. No internal codenames. No Slack channel references. No mention of “Claude Code” itself. The system prompt: “You are operating UNDERCOVER… Your commit messages MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.”

Line 15 adds a detail that will be quoted repeatedly: “There is NO force-OFF.”

You can force undercover mode ON with CLAUDE_CODE_UNDERCOVER=1. There is no way to force it off. Hiding internal project names is reasonable operational security. But “do not blow your cover” is an instruction to actively conceal the AI’s origin — not just avoid mentioning internal details, but to deceive. That is a harder thing to defend for a company whose brand rests on transparency and alignment.

3. KAIROS — The Always-On Agent Nobody Knew About

Referenced over 150 times throughout the codebase, KAIROS (Ancient Greek for “at the right time”) is an unreleased autonomous agent mode. The engineering scaffolding visible in main.tsx:

A /dream skill for nightly memory distillation
An autoDream process that runs while the user is idle — merging observations, removing contradictions, converting vague insights into absolute facts
Daily append-only session logs
GitHub webhook subscriptions for background event handling
Cron-scheduled context refresh every five minutes

This is not Claude Code as a tool you invoke. This is Claude Code as a persistent background process that monitors your work, consolidates its memory while you sleep, and wakes up ready with relevant context. An AI co-founder that never clocks out.

The strategic damage: Anthropic’s roadmap is now visible to every competitor. Where they are heading, and how far along they are. Strategic surprise, once lost, cannot be recovered.

4. The DRM Behind the OpenCode Fight

Ten days before the leak, Anthropic sent legal threats to OpenCode, forcing them to remove built-in Claude authentication because third-party tools were accessing Opus at subscription rates rather than pay-per-token pricing.

The leak explains the technical mechanism. In system.ts, API requests include a cch=00000 placeholder. Before the request leaves the process, Bun’s native HTTP stack — written in Zig, below the JavaScript runtime — overwrites those zeros with a computed hash. The server validates the hash to confirm the request came from a genuine Claude Code binary.

This is DRM implemented at the transport level, invisible to anything running in the JavaScript layer. Is it airtight? No — a determined developer can rebuild the JS bundle on stock Bun. But it significantly raises the cost of circumvention, and combined with legal pressure, that was enough.

5. Model Codenames and Regression Data

The leak confirms internal model codenames: Capybara (Claude 4.6 variant, on iteration v8), Fennec (Opus 4.6), Numbat (unreleased), Tengu (internal project).

More valuable than the names: Capybara v8 has a 29-30% false claims rate, up from 16.7% in v4. There is an “assertiveness counterweight” to prevent over-aggressive refactors. These are the internal benchmarks that define Anthropic’s current ceiling — and the weaknesses they are still fighting. That information is worth more to a competitor than the code itself.

6. A Three-Layer Memory Architecture

The leak reveals how Claude Code solves context entropy: a lightweight pointer index always in context (MEMORY.md), topic files fetched on-demand, and transcripts that are never fully re-read — only grep’d for specific identifiers. With a “Strict Write Discipline” rule: memory is updated only after a successful file write, preventing pollution from failed attempts.

This is a real solution to a real problem. Anyone building production agent systems — as we covered in The Machine That Does Its Own Research — will recognise the challenge. Context management is where most agent architectures quietly fail.

7. Sentiment Detection Via Regular Expression

An LLM company using regex for sentiment analysis is genuinely funny. The justification is sound: a regex is faster and cheaper than an LLM inference call just to check whether someone is swearing at the tool.

8. A Quarter Million Wasted API Calls Per Day

A comment in autoCompact.ts: “1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally.” The fix: cap consecutive failures at three. A quarter of a million wasted API calls per day, solved by a number.

9. Orchestration by Prompt

The multi-agent coordinator manages worker agents through system prompt instructions rather than code. The prompt includes: “Do not rubber-stamp weak work” and “You must understand findings before directing follow-up work.” The orchestration algorithm is a prompt. That is both a feature and a fragility — a pattern explored in detail in CLI Solved This Problem 50 Years Ago. MCP Still Has Not.

10. A 5,594-Line File

print.ts is 5,594 lines. One function spans 3,167 lines at 12 levels of nesting. This says something about the pressure Anthropic is building under, and what happens when an AI company dogfoods faster than it reviews.

What This Is Not

Not a hack. Not a customer data leak. Not unprecedented — Google’s Gemini CLI and OpenAI’s Codex are open source. But there is a difference between deliberately open-sourcing an agent SDK and accidentally exposing the full internal wiring of your flagship commercial product, including feature flags, model regression benchmarks, and an unreleased roadmap.

What This Actually Is

A significant IP exposure caused by a build pipeline mistake at a company moving fast enough to make that mistake twice in a week.

The real damage is not the code. Code can be refactored. The real damage is strategic visibility: KAIROS tells competitors where Anthropic is heading. The anti-distillation flag tells them what Anthropic fears. The model regression data tells them where Anthropic is stuck. The attestation architecture tells them exactly what obstacles stand between them and Anthropic’s pricing enforcement.

None of that can be un-leaked.

The undercover mode is the finding that will generate the most sustained public pressure. Anthropic has built its brand on safety and transparency. The code that leaked suggests that, in at least one context, the brand and the implementation diverge. That is the conversation Anthropic now has to have — on the record, in public, without the option of declining comment.

The good news, if there is any: they now know exactly what is visible and can make deliberate choices about what to change. What they cannot change is that the world now knows what they are building next.

Timeline

Date	Event
March 21, 2026	Anthropic sends legal threats to OpenCode over API authentication bypass
~March 24, 2026	Anthropic model spec accidentally leaked (separate incident)
March 31, 4:23am ET	Chaofan Shou posts discovery on X with direct download link
March 31, within hours	Code mirrored to 41,500+ GitHub forks; Hacker News goes viral
March 31, later	Anthropic pulls the package; issues statement
April 1, 2026	Code still widely available; community analysis ongoing

Sources: The Register, VentureBeat, CNBC, Fortune, Alex Kim’s technical analysis, Hacker News thread #47584540.