AI Wire · Friday, May 1, 2026

Frontier model launches & evals (GPT-5.5, Mythos, Grok 4.3)

The frontier reshuffled in a single day. OpenAI's GPT-5.5 lands roughly tied with Anthropic's Mythos Preview — 71.4% (±8.0%) average pass rate vs. 68.6% (±8.7%) — with one cited example of the model solving a ~12-hour expert task in under 11 minutes for $1.73 (@sama). Pattern Labs noted GPT-5.5 is the second model after Mythos to complete one of their multi-step cyber-attack simulations end-to-end (@tszzl), prompting genuine surprise from researchers that the gap on cyber has closed (@tszzl).

xAI used the same window to ship Grok 4.3, scoring 53 on the Artificial Analysis Intelligence Index — nudging just above Muse Spark and Claude Sonnet 4.6 — at roughly 40% lower input and 60% lower output pricing than Grok 4.20 (@openrouter). On agentic eval GDPval-AA it jumped 321 ELO points to 1500, surpassing top models despite the price drop (@openrouter). OpenRouter also teased a stealth foundation model "Owl Alpha" with a 1M context window aimed at agentic workloads (@openrouter). Even GPT-5.5's literary tells — lighthouses, Mira Vale, "secret third things" — got an early autopsy (@emollick).

Musk–OpenAI trial and the distillation debate

On the stand in Oakland, Musk conceded that xAI has "partly" distilled from OpenAI, with the line "Generally AI companies distill other AI companies" (@clementdelangue). Clement Delangue argued distillation is so common — for benchmarking, evaluation, and dataset augmentation — that it should be considered fair use, especially when results are open-sourced and break monopoly formation (@clementdelangue), and Nathan Lambert added that American labs distill Chinese open models too (@clementdelangue). Gary Marcus countered with a 2016 Sutskever-to-Musk email: "As we get closer to building AI, it will make sense to start being less open" — though Marcus argues the omitted next sentence reframes "open" as benefit-sharing rather than research-publishing (@garymarcus). Reddit's r/TradeVerseNetwork is tracking Day 4 of the trial with focus on Microsoft's co-defendant exposure (last30days, reddit.com).

Open-source / China momentum vs. the West

A trillion-parameter Chinese model, Ling-2.6-1T, dropped fully open and reportedly burns fewer tokens than leading "efficient" US models (@huggingface). Alibaba shipped Qwen-Scope, an open SAE suite exposing 81k features across 64 layers of Qwen3.5-27B for steerable inference and mechanistic analysis (@alibaba_qwen, @huggingface). Allen AI added the OlmPool series at 7-8B/150B-token checkpoints studying long-context extension (@clementdelangue). Demis Hassabis told Garry Tan he wants a Western open-source AI stack and that Google can't afford two frontier model families, which is why Gemma stays small (@clementdelangue). Hugging Face flagged Mistral as the only non-Chinese model in the top-25 of SWE-Bench Verified open models (@huggingface), and Delangue's blunt take: "American open source AI is in trouble. China is eating our lunch" (@clementdelangue) — echoing a 1,185-point HN piece this week titled "The West forgot how to make things, now it's forgetting how to code" (last30days, techtrenches.dev).

Coding agents: Codex, OpenClaw, vibe→agentic engineering

Codex CLI 0.128.0 added /goal, a Ralph-loop-style command that keeps a goal alive across turns until done (@steipete, @sama), plus a /side chat spawner for context-preserving tangents (@steipete). OpenClaw 2026.4.29 shipped agent-native group chats, safer exec/pairing, and an NVIDIA provider (@steipete); NVIDIA noted OpenClaw hit 250K GitHub stars in 60 days, the fastest climb in repo history (@nvidia). Karpathy's Sequoia Ascent framing: "vibe coding raised the floor. Agentic engineering raises the ceiling" (@karpathy). The flip side: Hugging Face auto-merged a quarter of agent PRs into transformers and found the quality skewed toward noise, with valuable fixes clustering on tokenizers and model-loading hotspots (@huggingface) — and Zig defended its blanket ban on AI-assisted contributions on the grounds that PR review is really about growing contributors (@jeremyphoward).

Agent infrastructure, interpretability, and payments

Goodfire raised $75M (Sequoia, Spark) to scale Silico, an interpretability-first platform for designing and debugging models like written software (@tszzl). Anthropic published a study of 1M conversations on how people seek Claude's guidance, including where it slips into sycophancy, and used the findings to train Opus 4.7 and Mythos Preview (@anthropicai). Agent Collabs launched a "swarm autoresearch" space mixing ml-intern, Codex, Claude Code and Hermes (@clementdelangue); OpenThoughts dropped AgentTrove, a 1.7M-sample agentic dataset (@clementdelangue); Mercor's APEX-Agents added a HF leaderboard for open models (@clementdelangue). Paperclip indexed all of arXiv plus PubMed Central plus 150M abstracts for one-line LLM ingestion (@_akhaliq). Stripe's Link wallet now lets agents spend on your behalf with per-purchase approval (@_philschmid).

AI's societal footprint: health, jobs, capex, and policy

A Mexico RCT of Mindsurf's CBT chatbot improved women's mental health by 0.3 SD over six months, with gains in sleep, healthy behaviors, and labor-market outcomes (@emollick). Google DeepMind unveiled an AI co-clinician using a dual-agent architecture — a "Planner" continuously verifies the "Talker" stays within safe clinical bounds (@googledeepmind). A Chinese court ruled it illegal to replace workers with AI purely for cost-cutting, citing employer social responsibility (@garymarcus). MarketWatch flagged Big Tech's $700B AI capex as "the greatest capital misallocation in history" (@garymarcus), while Google employees told reporters senior management had repeatedly promised they wouldn't cave to Pentagon demands before the DOD deal (@garymarcus).

The Bottom Line

The day collapsed three storylines into one: the frontier is now a four-model dogfight at near-parity (GPT-5.5/Mythos/Grok 4.3/Sonnet 4.6) with prices falling, China keeps shipping the most interesting open-source artifacts, and the legal/political guardrails — distillation in court, capex critiques, a Chinese ruling against AI layoffs — are starting to bind. Agents broke containment further into payments, group chats, and PR queues, but the maintainer backlash (Zig, transformers' noise distribution) suggests the next ceiling is human review bandwidth, not model capability.

Dispatch № 11 · Filed Friday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Frontier model launches & evals (GPT-5.5, Mythos, Grok 4.3)

Musk–OpenAI trial and the distillation debate

Open-source / China momentum vs. the West

Coding agents: Codex, OpenClaw, vibe→agentic engineering

Agent infrastructure, interpretability, and payments

AI's societal footprint: health, jobs, capex, and policy

The Bottom Line

Sources

Frontier model launches & evals

Musk–OpenAI trial and the distillation debate

Open-source / China momentum vs. the West

Coding agents

Agent infrastructure, interpretability, and payments

AI's societal footprint