Anthropic's Project Glasswing: AI Now Finds Vulnerabilities Better Than Most Humans
What Is Project Glasswing?
Anthropic has launched Project Glasswing — an urgent initiative to secure the world''s most critical software using its newest frontier model, Claude Mythos Preview. The initiative brings together 12 major technology organizations — including AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, NVIDIA, Palo Alto Networks, and the Linux Foundation — with a shared mission: find and fix vulnerabilities in widely-deployed open-source software before attackers do.
"We have reached a watershed moment for security. AI models can now surpass all but the most skilled human researchers at finding and exploiting software vulnerabilities." — Anthropic
This is directly relevant to what we are building at Casky. We have been saying since day one that AI agents running security skills are the future of the craft. Project Glasswing is Anthropic putting that thesis into production at scale.
Claude Mythos Preview: The Model Behind It
Claude Mythos Preview is a new frontier model that Anthropic describes as "strikingly capable at computer security tasks." It is not generally available — access is being limited to Project Glasswing partners and open-source security researchers to ensure it is used for defence, not attack.
The benchmark numbers are striking:
| Task | Claude Opus 4.6 | Claude Mythos Preview |
|---|---|---|
| Vulnerability reproduction | 66.6% | 83.1% |
| Firefox JS engine exploits (working) | 2 | 181 |
| Corporate network attack scenarios | Partial | Solved in under 10 hrs |
What It Has Already Found
Using an agentic scaffold — isolated containers, source code access, automated severity filtering, and human validation before disclosure — Claude Mythos has identified thousands of zero-day vulnerabilities across critical infrastructure. Notable examples:
- 27-year-old OpenBSD TCP SACK vulnerability — exploitable via signed integer overflow, allowing remote system crashes
- 16-year-old FFmpeg H.264 codec vulnerability — involving sentinel value collision, missed by automated fuzzing after 5 million attempts
- 17-year-old FreeBSD NFS vulnerability — remote code execution allowing unauthenticated root access
- Web browser sandbox escape — chaining four vulnerabilities, including a complex JIT heap spray that escaped both renderer and OS sandboxes
Less than 1% of the discovered vulnerabilities had been patched at the time of publication. Anthropic used cryptographic SHA-3 commitments to document findings while protecting unpatched systems during coordinated disclosure.
The Business Model of Defensive AI
Anthropic is backing this with real money:
- $100 million in model usage credits for Project Glasswing partners
- $4 million in direct donations to open-source security organisations
- Access extended to 40+ additional organisations for defensive capability development
Cost per vulnerability discovery ranges from $50 to $10,000+, with thousand-run evaluation campaigns costing under $20,000. That is orders of magnitude cheaper than a traditional penetration testing engagement for the same coverage.
Why This Matters for Casky Students
This is the exact convergence we built Casky for. The skills in our registry — mapped to MITRE ATT&CK, NIST CSF 2.0, and OWASP Top 10 — are the building blocks of what Mythos does at scale. Understanding how to construct an agentic security workflow, how to read raw tool output, how to chain findings into a coherent narrative — that is not going away. It becomes more valuable as AI amplifies what skilled practitioners can do.
The gap Anthropic is describing — between AI-assisted attackers and defenders who are still doing manual work — is exactly the gap Casky exists to close.

Always On Security Coverage with Hermes Agent and Claude Cybersecurity Skills
Most teams get security coverage during business hours — if they're lucky. Red Teaming tests are rare. Hermes Agent changes that. Pair it with Claude Cybersecurity Skills and you have a persistent AI agent scanning for threats, surfacing findings, and suggesting fixes around the clock. No SOC required.

How AI Agent Tech Is Moving Through Time
Anthropic, OpenAI, and Perplexity shipped flagship agent products on overlapping release calendars over 30 days. Here is what changed, what the benchmarks say, and where the arc of agent development is bending.

