Claude Fable 5 launches with cyber safeguards; Mythos 5 kept for vetted defenders

Anthropic released Claude Fable 5 on June 9. It’s the company’s most capable model to date and generally available. There’s a twist: Anthropic is shipping one underlying model in two forms, separated by a layer of safety classifiers.

Fable 5 is the public release. Claude Mythos 5—functionally the same model with cyber safeguards lifted—remains restricted to vetted security teams and critical infrastructure operators. Anthropic describes Mythos 5 as its strongest cybersecurity model.

Both Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, less than half the cost of the earlier Mythos Preview. Fable 5 is available now through the Claude API. It’s included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost through June 22, then moves to usage credits.

How the safety split works

The risk is straightforward: models at this level can find and exploit software vulnerabilities. Anthropic’s view is that releasing that ability without controls would give attackers significant uplift. So Fable 5 includes a layer of classifiers that look for misuse and jailbreak attempts across categories like cybersecurity, biology, chemistry, and model distillation.

When a request trips a classifier, Fable 5 doesn’t refuse. It hands the response to the weaker Claude Opus 4.8 and tells the user a fallback occurred.
“Distillation” is the outlier category. It refers to extracting a model’s capabilities to train a competitor. Anthropic blocks this to prevent near-frontier abilities from leaking without safeguards.
The cybersecurity classifier is broad by design. It targets offensive tasks end to end: reconnaissance, discovery, lateral movement, and other agentic steps that make up a real intrusion.

In Anthropic’s internal evaluation (with Fable 5 set to block rather than fall back, and without attempts to evade safeguards), the classifiers stopped progress on these offensive tasks. An external partner reported zero harmful single-turn compliances on cyberattack planning, exploit development, or defense evasion, even across 30 public jailbreak techniques.

There’s a trade-off. Tuned conservatively to ship fast, the safeguards sometimes catch benign requests. Anthropic says fallback triggers in under 5% of all sessions. That means in more than 95% of cases, Fable 5 behaves like the cyber-unrestricted Mythos 5. The 5% figure includes genuine blocks, so it caps total disruption rather than isolating false positives. Anthropic plans to narrow the classifiers post-launch.

Robustness so far

An external bug bounty ran for more than 1,000 hours and yielded no universal jailbreak—no single prompt or harness that strips protections wholesale. External red teams likewise found none for long-form agentic tasks. One caveat is noted clearly: the UK’s AI Security Institute made progress toward a universal jailbreak during a brief initial testing window. Anthropic acknowledges that fully preventing universal jailbreaks is likely impossible. The goal is to make any that remain slow and costly enough to catch before they can be used at scale.

Why this capability matters

Anthropic outlined the stakes back in April with the limited release of Claude Mythos Preview through Project Glasswing. The company’s red team technical write-up is the key source.

During testing, Mythos Preview identified and exploited zero-day vulnerabilities across every major operating system and every major web browser when directed by a user. It found a 27-year-old flaw in OpenBSD. It also autonomously wrote a remote code execution exploit against FreeBSD’s NFS server from a 17-year-old bug, triaged as CVE-2026-4747.

Anthropic characterized the impact as full root for an unauthenticated attacker from anywhere on the internet. NVD’s entry is more measured. It notes the stack overflow itself doesn’t require client authentication, and frames kernel code execution as reachable by an attacker able to send packets to the NFS server while the kgssapi.ko module is loaded.

Anthropic says it didn’t explicitly train for these capabilities. They emerged from general gains in code, reasoning, and autonomy—the same improvements that help with patching. The red team’s warning is specific: mitigations that rely on friction—time, tedium, manual effort—get much weaker when a model can grind through exploitation steps at scale.

Hard technical barriers like KASLR and W^X still raise costs. The caution applies to defenses that assume the attacker runs out of patience. The model does not. Anthropic says Mythos 5 is comparable to, or somewhat stronger than, Mythos Preview.

The defender’s actual problem

Early results from Glasswing show the upside and the strain. Anthropic and about 50 partners used Mythos Preview to surface more than 10,000 high- or critical-severity vulnerabilities in systemically important software.

Cloudflare found 2,000 bugs, 400 rated high or critical.
Mozilla found and fixed 271 in Firefox 150—over ten times what it caught in Firefox 148 with the older Opus 4.6.

The pipeline shifted. Finding bugs is now cheap and fast. Verifying, triaging, and patching remain slow and human. Open-source maintainers, already swamped by low-quality AI-generated reports, asked Anthropic to slow disclosures because patches aren’t landing fast enough. In Glasswing, a high- or critical-severity bug takes about two weeks to patch on average.

The gap between public disclosure and deployed fix is where attackers work. Anthropic’s red team sharpened this with N-day tests: starting only from a disclosed CVE and its patch, Mythos Preview built working Linux privilege-escalation exploits in under a day each, for a few thousand dollars or less in compute.

What to do now

Assume a high-severity CVE can turn into a working exploit within hours of disclosure, not weeks.
Prioritize auto-update paths for internet-facing systems.
Treat dependency bumps that include CVE fixes as time-sensitive, not backlog.
Keep MFA and comprehensive logging in place so one missed patch isn’t a single point of failure.

For vetted professionals who need offensive capabilities, Anthropic’s Cyber Verification Program allows use without the cyber safeguards.

New 30-day data retention for Mythos-class models

Anthropic is changing data handling for models at this capability level. All traffic to Fable 5, Mythos 5, and future peers will carry a 30-day retention requirement across both first- and third-party surfaces.

The company says it won’t use this data for training or any non-safety purpose.
All human access will be logged.
Data will be deleted after 30 days unless a safety investigation or legal obligation requires longer retention.

The goal is defensive: visibility into multi-request patterns helps detect novel attacks and jailbreaks. Teams with strict data requirements should factor this window in before routing sensitive traffic.

Access, pricing, and what’s next

Anthropic plans to widen Mythos 5 access through a trusted-access program. As compute capacity improves, the company aims to fold Fable 5 back into subscription plans without the usage-credit premium that starts after June 22.

The broader point remains: similarly capable models from other labs are coming, and not all will ship behind classifiers. The defensive head start Project Glasswing is trying to buy only matters if the rest of the ecosystem uses it.

Reference: View article