
How AI Agent Tech Is Moving Through Time
The past 30 days have compressed what once took a year of agent progress into four weeks, with Anthropic, OpenAI, and Perplexity all shipping flagship agent products on overlapping release calendars. Anthropic launched Claude Opus 4.7 and revealed a cyber-capable frontier model it refuses to deploy broadly. OpenAI closed a $122B funding round at an $852B valuation1 and rebuilt its Agents SDK around a native sandbox. Perplexity's agent-first pivot drove annual recurring revenue from roughly $305M to $450M in a single month2. The three companies are converging on the same architecture — long-horizon agents that control computers, browsers, and native apps — but are diverging sharply on safety posture, distribution strategy, and business model. This report synthesizes what shipped, what the benchmarks say, and how the arc of agent development bent between March 18 and April 18, 2026.
Anthropic bifurcates its model line between shipping and withholding
Anthropic's defining move of the window was releasing two very different frontier models seven days apart. On April 7 the company unveiled Project Glasswing and its unreleased model Claude Mythos Preview (codename "Capybara"), characterized publicly as "by far the most powerful AI model we've ever developed" but too dangerous for general release because of cyber-offense capability. Access was limited to roughly 40 critical-infrastructure partners — AWS, Apple, Cisco, CrowdStrike, JPMorganChase, Microsoft, Nvidia, Palo Alto Networks, the Linux Foundation — with up to $100M in credits. The UK AI Security Institute published an independent evaluation six days later finding Mythos Preview succeeds on expert-level capture-the-flag tasks 73% of the time and can autonomously discover and exploit vulnerabilities, including a 17-year-old FreeBSD RCE (CVE-2026-4747)3.
Nine days after Mythos, on April 16, Anthropic shipped Claude Opus 4.7 as the commercial flagship — deliberately less capable than Mythos on cyber tasks, with new automated classifiers that detect and block high-risk security use. Pricing held at $5/M input and $25/M output. Benchmark gains vs. the four-month-old Opus 4.6 were substantial: SWE-bench Verified 87.6% (+6.8 pp), SWE-bench Pro 64.3% (+10.9 pp, ahead of GPT-5.4's 57.7%), OSWorld-Verified 78.0% for computer use, and MCP-Atlas 77.3% for scaled tool use. Anthropic emphasized real-world agent benchmarks over TAU-bench and GAIA and leaned heavily on customer quotes: Notion Agent reported "first model to pass our implicit-need tests"; Rakuten saw "3x more production tasks resolved"; Cognition's Devin "works coherently for hours."4
Around Opus 4.7, Anthropic shipped a dense stack of agent infrastructure. Computer use in Claude Code and Cowork reached macOS on March 23 and Windows on April 3. A new Advisor tool (API beta) pairs a fast executor model with a higher-intelligence advisor for long-horizon runs; Claude Managed Agents provides a fully managed harness with sandboxing and SSE streaming. A new /ultrareview slash command runs parallel multi-agent code review. A leaked source map of the Claude Code npm package on March 31 exposed unreleased features — UltraPlan, Voice Mode, Bridge mode for remote control, and a Coordinator tool for multi-agent orchestration — that Anthropic is preparing to ship.
The enterprise surface expanded decisively. Claude for Word entered public beta around April 10, completing a Word/Excel/PowerPoint trifecta with shared cross-app context aimed at legal, finance, and consulting work. Anthropic disclosed run-rate revenue above $30B, up from roughly $9B at the end of 2025, and said more than 1,000 customers now spend over $1M annually — a figure that has doubled in under two months4. Compute capacity was locked down aggressively: an April 6 deal with Google and Broadcom for 3.5 gigawatts of next-gen TPU capacity through 2031, a multi-year NVIDIA GPU agreement with CoreWeave on April 10, and Bloomberg reporting that Anthropic is exploring custom silicon5. CFO Krishna Rao called the Broadcom deal "a continuation of our disciplined approach to scaling infrastructure." Valuation offers are reportedly circulating at up to $800B, with an IPO rumored for Q4 2026.
Politically, Anthropic simultaneously sued the Department of Defense over a "supply-chain risk" designation and briefed the White House on Mythos. Dario Amodei met with Chief of Staff Susie Wiles and Treasury Secretary Scott Bessent on April 17. Co-founder Jack Clark framed the contracting dispute as narrow at the Semafor World Economy Summit: "There will be other systems just like this in a few months from other companies, and in a year to a year-and-a-half later, there will be open-weight models from China that have these capabilities." The Long-Term Benefit Trust also appointed Novartis CEO Vas Narasimhan on April 14, giving Trust-appointed directors a majority on the seven-person board.
OpenAI buys growth infrastructure while stacking agent plumbing
OpenAI's window opened and closed with capital and enterprise moves that reframed the company's scale. On March 31 it closed the largest private tech fundraise in history — $122B at an $852B valuation1 — with Amazon investing $50B (partly contingent on IPO by 2028 or AGI), Nvidia $30B, and SoftBank $30B. CFO Sarah Friar told CNBC the company will reserve IPO shares for retail investors and that "it's good hygiene" for an $852B company to "look and feel and act like a public company." CRO Denise Dresser's April 8 memo disclosed that enterprise now exceeds 40% of revenue, that Codex reached 3M weekly active developers, and that OpenAI's APIs process more than 15 billion tokens per minute6. Annualized revenue is near $25B on $2B per month. Goldman Sachs, Phillips, and State Farm were named as new enterprise customers.
The model cadence was brisk. GPT-5.4 launched March 5 (just before the window) as the first OpenAI model with native computer-use and a 1M-token context window, with a new Tool Search mechanism cutting tool-heavy token costs about 47%. GPT-5.4 mini and nano followed on March 17, extending the family to sub-agents and free-tier users; mini scored 54.38% on SWE-bench Pro. GPT-5.3 Instant Mini became a new ChatGPT fallback on April 9, alongside a new $100/month ChatGPT Pro tier slotted between Plus and the existing $200 tier. On April 16 OpenAI unveiled GPT-Rosalind, a life-sciences reasoning model in research preview with Amgen, Moderna, Thermo Fisher, and the Allen Institute, scoring above the 95th percentile of human experts on RNA sequence-to-function prediction. GPT-6 ("Spud") pre-training reportedly completed March 24 at the Abilene Stargate data center; launch is unconfirmed and should be treated as rumor.
Agent infrastructure was the clearest thematic throughline. The April 15 "next evolution of the Agents SDK" moved to a model-native harness with native sandbox execution, configurable memory, Codex-like filesystem tools, standardized MCP integration, an AGENTS.md convention, an apply-patch tool, and a Manifest abstraction for portable workspaces. Sandboxing partners at launch included Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel; storage spanned AWS S3, GCS, Azure Blob, and Cloudflare R2, with snapshotting and rehydration for durable long runs. A day later, Codex gained native computer-use on macOS, an in-app browser, gpt-image-1.5 image generation, memory in preview, scheduled automations, and more than 90 plugins spanning Atlassian Rovo, CircleCI, GitLab Issues, the Microsoft Suite, Neon by Databricks, Remotion, Render, and Superpowers. SSH devbox connections entered alpha. To buttress the coding push, OpenAI announced on March 19 that it is acquiring Astral, the Python tooling team behind uv, Ruff, and ty.
Consumer surface area also grew more agentic. The Agentic Commerce Protocol expanded on March 24 to support merchant-fed catalogs across Target, Sephora, Nordstrom, Lowe's, Best Buy, The Home Depot, and Wayfair, with Walmart launching its own ChatGPT app that supports payments and loyalty. Sora, by contrast, was shuttered: the web/app closes April 26 and the API on September 24, ending the Disney partnership that had invested $1B.
Safety and security formed a distinct track. OpenAI announced the Safety Fellowship on April 6 — roughly $200K annualized stipends, ~$15K/month in compute, and residency at Constellation in Berkeley — hours after a Ronan Farrow New Yorker investigation reported the dissolution of superalignment and AGI-readiness teams7. A Child Safety Blueprint landed April 8 in partnership with NCMEC, the Attorney General Alliance, and Thorn. On April 10 OpenAI disclosed that the Axios npm library compromise attributed to North Korea's Lazarus Group had reached its macOS signing workflow8; code-signing certificates were rotated and all macOS users must update by May 8. Most consequentially, GPT-5.4-Cyber — a defensive cyber-permissive variant — launched April 14 through an expanded Trusted Access for Cyber program, arriving one week after Anthropic's Mythos release and reframing frontier cyber capability as a competitive surface.
Perplexity turns agents into revenue faster than anyone expected
Perplexity's month was defined by a single headline: ARR rose roughly 50% to about $450M in March, per Financial Times reporting on April 82. The jump followed the late-February launch of Perplexity Computer and a shift to credits-based usage pricing, with subscriptions spanning $20 to $200 per month and more than 100M monthly active users. Aravind Srinivas posted April 15: "We just 5X'ed revenue from $100M to $500M with only 34% growth in team size."9
The product surface expanded on every axis. Comet for iPhone launched March 18, completing cross-platform coverage across iOS, Android, Mac, and Windows, and hit #3 on the US App Store within 48 hours. Weekly changelogs piled on: Computer gained inline editing of any generated asset, scheduled task management, live credit counters, multi-select bulk actions, and PDF/DOCX export; Deep Research began generating presentations, spreadsheets, dashboards, and full websites inline. Enterprise features added Vercel and Box connectors, a Snowflake connector with Data Map, Slack app onboarding, and admin controls for Computer sandbox internet access. A partnership with CrowdStrike, announced March 15, embedded Falcon protections into Comet Enterprise. Vertical agents filled out fast: Perplexity Health on March 19 with connectors to 1M+ providers plus Fitbit, Google Fit, and Clue; Computer for Taxes on April 2 with IRS form mapping and audit drafting; and Finance Computer with 40+ built-in tool calls spanning SEC filings, FactSet consensus, Polymarket data, and Plaid brokerage integration.
The biggest launch was Personal Computer for Mac on April 16, exclusive to the $200/month Max tier. It runs an always-on local agent that integrates with the Mac file system, iMessage, Apple Mail, Calendar, and the Comet browser; it's activated by double-tapping Command, can be triggered remotely from iPhone with 2FA, and orchestrates roughly 20 frontier models behind the scenes. Perplexity recommends a dedicated Mac mini for continuous operation. Srinivas's positioning was explicit: "A traditional operating system processes commands; an AI operating system focuses on goals." Safeguards include a kill switch, confirmation prompts for sensitive actions, and an audit trail.
The headwinds were equally notable. On April 1 Perplexity appealed to the Ninth Circuit to overturn an injunction barring Comet's shopping agent from accessing Amazon — the first major court test of agentic commerce, arguing that directing an agent to shop on Amazon is no different from opening a browser. Security researchers at Zenity Labs, LayerX, and Guardio disclosed the "PerplexedBrowser" zero-click vulnerability and "CometJacking" prompt-injection attacks that could exfiltrate Gmail and 1Password vault data10; Comet's App Store ranking slid from #3 to unranked within days. Srinivas drew backlash for April 1 podcast remarks defending AI-driven layoffs ("most people don't enjoy their jobs… that sort of glorious future is what we should look forward to"), and X product head Nikita Bier accused him on April 17 of running "undisclosed promotion campaigns" around the Personal Computer launch. A rumored Apple acquisition at roughly $14B resurfaced via secondary reports but no primary confirmation emerged, and no new funding round closed during the window.
Quietly consequential was the Agent API and broader Perplexity API Platform, re-framed March 11 as a four-API stack (Agent, Search, Embeddings, Sandbox-forthcoming) and marketed as the runtime "used across hundreds of millions of Samsung devices and by 6 out of 7 of the MAG7." An n8n native node and OpenClaw integration arrived in April, broadening developer distribution.
Five convergences that define the arc
1. Computer use has become table stakes. All three companies now ship agents that directly click, type, and manipulate native desktop apps. Anthropic extended Claude's computer use from Mac to Windows; OpenAI added native computer-use on macOS to Codex on April 16; Perplexity's Personal Computer for Mac integrates with iMessage, Mail, and Calendar. Benchmarks are converging on OSWorld-Verified (Opus 4.7 at 78.0%, Mythos at 79.6%, GPT-5.4 at 75.0%) as the shared yardstick, displacing TAU-bench and GAIA from headline positioning. The human-expert OSWorld baseline of roughly 72.4% has now been passed by every frontier model.
2. Coding agents are the commercial tip of the spear. Claude Code grew 300% since the Claude 4 family shipped and reported a 5.5x run-rate revenue jump; Codex hit 3M weekly developers; Perplexity Computer added a GPT-5.3-Codex coding subagent. SWE-bench Pro is the benchmark actually moving — Opus 4.7 jumped nearly 11 points to 64.3%, and OpenAI's acquisition of Astral signals that coding is where enterprise ARR gets minted.
3. Safety posture has become a product axis. Anthropic's explicit two-tier Mythos-vs-Opus-4.7 bifurcation is the starkest framing: a frontier cyber model withheld from general release, paired with a commercial flagship carrying "differentially reduced" cyber capability and automated misuse classifiers. OpenAI mirrored the move within seven days via GPT-5.4-Cyber and the Trusted Access for Cyber program. Perplexity's security partnerships with CrowdStrike — and the Comet vulnerabilities that followed — show the inverse risk: agentic browsers dramatically expand attack surface.
4. Multi-agent orchestration is emerging as the new abstraction. Anthropic's Advisor tool, /ultrareview, leaked Coordinator tool, and Claude Managed Agents all operationalize multi-agent patterns. OpenAI's rebuilt Agents SDK formalizes durable sandboxed runs with snapshotting. Perplexity's "Model Council" runs GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro in parallel. The shared architectural pattern is executor-plus-supervisor with persistent memory, task budgets, and portable workspaces.
5. Browser-native agents are now a distribution battleground. Comet completed iOS/Android/Mac/Windows coverage; Claude's Chrome extension continues expanding; Codex added an in-app browser. The Amazon-Perplexity injunction is the first legal stress test of whether an agent visiting a site on a user's behalf is authorized access — a question that will define web-scale agent economics.
Revenue curves tell the real story
| Company | Revenue signal | Trajectory |
|---|---|---|
| Anthropic | >$30B run rate; 1,000+ customers >$1M/yr (doubled in <2 months) | Locking up 3.5 GW of TPU capacity through 2031 |
| OpenAI | ~$25B annualized; $2B/mo; enterprise >40% | Closed $122B at $852B valuation; IPO signaled for 2H 2026 |
| Perplexity | ~$450M ARR (up ~50% in March); 100M+ MAU | Agent-first monetization working; no confirmed new round |
The economic subtext of the window is that agent products are monetizing faster than the models that power them. Anthropic's customers-over-$1M doubled in eight weeks. OpenAI's enterprise share passed 40%. Perplexity added roughly $145M of ARR in a single month by turning search into goal-execution. The differentiation is no longer model quality alone — it is how deeply agents reach into the file systems, browsers, and enterprise data stores where work actually happens.
Conclusion: The pace itself is the story
Four weeks delivered a flagship Claude model, the largest private fundraise in tech history, an agent OS for Mac, two new frontier cyber models released in the same week, a completed computer-use stack on three platforms, and the first court test of agentic commerce. The direction is now clear: agents that operate computers, coordinate in teams, persist memory across sessions, and monetize via usage-based credits or enterprise seats. What is newly uncertain is governance — whether frontier models with autonomous vulnerability-exploitation capability can be released at all, whether web infrastructure will tolerate agent traffic without authorization frameworks, and whether safety-gated deployment (Mythos) becomes the industry norm or an Anthropic-specific posture. The companies are converging on architecture and diverging on values, and the next 30 days will test which bet compounds.
Footnotes
-
OpenAI — $122B funding round announcement, March 31, 2026. Amazon ($50B), Nvidia ($30B), SoftBank ($30B) named as lead investors. ↩ ↩2
-
Financial Times — "Perplexity AI revenue surges to $450M ARR", April 8, 2026. ARR growth from ~$305M to ~$450M reported for March 2026. ↩ ↩2
-
UK AI Security Institute — Independent evaluation of Claude Mythos Preview, April 13, 2026. Includes CTF success rate (73%) and CVE-2026-4747 FreeBSD RCE findings. ↩
-
Anthropic — Claude Opus 4.7 launch and company disclosures, April 16, 2026. Benchmark figures, customer quotes, and revenue run-rate reported at time of release. ↩ ↩2
-
Bloomberg — "Anthropic explores custom AI chips", April 2026. Reporting on Anthropic's custom silicon exploration alongside CoreWeave and Broadcom deals. ↩
-
OpenAI — CRO Denise Dresser internal memo, April 8, 2026. Enterprise >40% of revenue, Codex at 3M weekly developers, 15B tokens/min API throughput. ↩
-
Ronan Farrow — The New Yorker, April 6, 2026. Investigation reporting dissolution of OpenAI superalignment and AGI-readiness teams. ↩
-
OpenAI — Security disclosure: Axios npm compromise, April 10, 2026. North Korea Lazarus Group attribution; macOS signing workflow affected; cert rotation required. ↩
-
Aravind Srinivas on X, April 15, 2026. "We just 5X'ed revenue from $100M to $500M with only 34% growth in team size." ↩
-
Zenity Labs, LayerX, Guardio — PerplexedBrowser and CometJacking disclosures, April 2026. Zero-click vulnerability and prompt-injection attacks documented in Comet browser. ↩

Claude Mythos: Manufactured Fear, Real Consequences
Anthropic's most powerful model is real. The "thousands of zero-days" headline is not. Here is what the evidence actually says — and why the gap between the two matters for every security researcher in this industry.

Always On Security Coverage with Hermes Agent and Claude Cybersecurity Skills
Most teams get security coverage during business hours — if they're lucky. Red Teaming tests are rare. Hermes Agent changes that. Pair it with Claude Cybersecurity Skills and you have a persistent AI agent scanning for threats, surfacing findings, and suggesting fixes around the clock. No SOC required.

