AI Wire · Friday, May 22, 2026

Qwen3.7-Max & frontier model benchmarks

Alibaba launched Qwen3.7-Max, scoring 56.6 on the Artificial Analysis Intelligence Index — a 4.8-point jump over Qwen3.6-Max-Preview and the closest Alibaba has come to the frontier (@alibaba_qwen). The flagship is explicitly pitched at the agent era: end-to-end coding, MCP-integrated office workflows, and long-horizon autonomy demonstrated by a 35-hour, 1,158-tool-call run that auto-optimized an Extend Attention kernel to a 10x geometric-mean speedup (@alibaba_qwen). It went live on OpenRouter the same day with explicit prompt caching (@openrouter).

On the closed-frontier side, an OpenAI model reportedly disproved Erdős's unit distance conjecture using algebraic number theory, with Mark Chen and Roon framing mathematics as the field most amenable to AI breakthroughs (@tszzl, @markchen90). Separately, a 45-scientist, 469-hour study found GPT-5.2 competitive with top-rated Nature peer reviewers (@emollick) — though Gary Marcus pushed back that math-benchmark gains don't automatically generalize (@garymarcus).

OpenAI Codex Thursday & developer agent updates

OpenAI shipped a heavy Codex Thursday wave (@sama, @openai, @gdb): Appshots (Cmd-Cmd to attach a Mac window with screenshot + offscreen text), Remote Computer Use from Codex Mobile to a locked Mac, ChatGPT for PowerPoint, advanced annotation mode, and enterprise token analytics. Peter Steinberger called codex compaction "the single biggest UX improvement in AI in the last 6 months" (@steipete).

The broader agent-tooling beat was loud: Simon Willison released a Datasette Agent alpha for SQLite Q&A (@simonw), Boris Cherny previewed /usage token attribution in Claude Code (@bcherny), Alex Finn shipped seven new Hermes features including persistent session memory and background tasks (@alexfinn), and swyx surfaced Cursor's internal /thermo-nuclear-code-quality-review skill that deletes complexity and blocks >1k-line files (@swyx). Latent Space profiled Daytona's agent-native sandboxes — 60ms spin-up, 850K daily runs, RL workloads now ~50% of usage (@swyx, @latentspacepod). Philipp Schmid demoed a GitHub triage agent built with a single Gemini curl, no framework (@_philschmid).

Open-source model releases, architectures & research

Stability AI released Stable Audio 3 with three open variants (2B Medium plus 0.6B Music and VFX) (@huggingface). Tencent open-sourced Hy-MT2 multilingual translation across 33 languages with 7B/30B-A3B SOTA claims (@huggingface). Cohere put Command A+ on the Hub with W4A4 quantization and Apache 2 licensing championed internally by Nick Frosst (@huggingface, @aidangomez). vLLM shipped Elastic Expert Parallelism — live DP/EP resizing of MoE deployments without restart, using NVLink weight transfers (@vllm_project). llama.cpp now has a built-in model router that the community is positioning as an Ollama replacement (@huggingface). fal launched FLUX Erase from Black Forest Labs (@fal).

Architecture research clustered around linear-attention edits: Gated DeltaNet-2 decouples erase and write and beats KDA/Mamba-3 at 1.3B (@jeremyphoward), and CODA folds memory-bound ops into matmul epilogues, with LLMs themselves writing near-SoL CODA kernels (@jeremyphoward). Alibaba's MIGA does train-free infinite-frame video with dual consistency (@_akhaliq), HF's physics-intern harness lifted Gemini 3.1 Pro from 17.7 → 31.4 to beat GPT 5.5 Pro (@huggingface), and Mosaic delivered 24-member 10-day global weather forecasts in under 12s on a single H100 (@huggingface).

LeRobot Humanoid open-source bipedal robot

Hugging Face's LeRobot team released LeRobot Humanoid — a mostly 3D-printed bipedal robot for roughly $2,500 (@LeRobotHF, @huggingface, @clementdelangue). The release is a full stack: hardware/CAD, runtime + calibration, simulation environments, system-identification tools, and a training zoo for locomotion. Clément Delangue framed it as a deliberate bet that lowering the entry point matters because "robotics is too hard to solve alone," invoking ROS as the open-ecosystem-becomes-standard precedent (@clementdelangue).

AI economics, tokenomics & compute scarcity

The day's loudest macro story was the end of the AI subsidy era: Microsoft reportedly killed internal Claude Code licenses because token billing was untenable even for them, Uber's CTO told staff they burned the entire 2026 AI budget in four months, US AI software prices are up 20–37%, and GitHub is dropping flat-rate plans (@clementdelangue). Ethan Mollick's framing: cheap chatbots for everyone, expensive agents reserved for those who can pay — bad for democratization (@emollick). In separate conversations with Anthropic's Krishna, Dylan Patel, and Gavin Baker, Clément heard the same claim — frontier tokens capture an overwhelming majority of model-layer economic value (@clementdelangue).

The financial picture is uglier underneath: Gary Marcus surfaced an OpenAI Q1 operating margin of -122% even ex-SBC (@garymarcus, @amir) and Dario Cpx's data showing ~95% of NVIDIA's operating cash flow now absorbed by circular financing vs ~57% a year ago (@garymarcus). On the labor side, Gavin Newsom ordered a 90-day dashboard tracking AI's employment impact via state UI data — though Ara Kharazian flagged that UI systems can't actually identify AI-caused layoffs (@arakharazian). ClickUp cut 22% of headcount, with the CEO framing it as a productivity-mode change rather than cost-cutting (@alexfinn).

Cybersecurity vulnerabilities & threat bulletin

A brutal CVE day. Cisco Secure Workload has an unauthenticated CVSS 10.0 REST API flaw (CVE-2026-20223) crossing tenant boundaries with Site Admin privileges, no workarounds, affecting both SaaS and on-prem (@thehackersnews). Microsoft Defender has two actively exploited bugs — local SYSTEM escalation (CVE-2026-41091) and DoS (CVE-2026-45498) — added to CISA KEV with a June 3 deadline (@thehackersnews). CISA also added a critical Langflow RCE (CVE-2025-34291, CVSS 9.4) and a Trend Micro Apex One directory traversal (@thehackersnews). Public PoC dropped for DirtyDecrypt (CVE-2026-31635) on CONFIG_RXGK-enabled Fedora/Arch/openSUSE (@thehackersnews), Microsoft released mitigations for BitLocker YellowKey (CVE-2026-45585), and Drupal patched a critical PostgreSQL-backed core flaw (@thehackersnews).

The ThreatsDay bulletin highlights 47 zero-days, Gunra ransomware (ex-Conti) going full RaaS with 32 victims, a Composer 2.9.8/2.2.28 fix for GitHub Actions token leakage, the OrBit Linux rootkit still active in 2026, "Vibe Hacking" campaigns using AI agents against LatAm gov and banks, and Showboat malware in Middle East telecom (@thehackersnews). On the consumer side, Discord defaulted voice/video to E2EE, and the FTC fined Cox Media Group and two firms nearly $1M over the "active listening" adtech claims that Simon Willison had publicly doubted from day one (@simonw).

The Bottom Line

Today was the convergence of three trendlines: open-source pushed hard into humanoids, audio, translation, and linear-attention research, while OpenAI and Alibaba flexed at the frontier on agents and proofs. Underneath the launches, the economics turned visibly sour — Microsoft killing Claude Code seats, OpenAI's -122% margin, and NVIDIA's circular-financing share all point to subsidies ending. And the security side reminded everyone the attack surface keeps growing faster than the patch cycle.

Dispatch № 29 · Filed Friday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Qwen3.7-Max & frontier model benchmarks

OpenAI Codex Thursday & developer agent updates

Open-source model releases, architectures & research

LeRobot Humanoid open-source bipedal robot

AI economics, tokenomics & compute scarcity

Cybersecurity vulnerabilities & threat bulletin

The Bottom Line

Sources

Qwen3.7-Max & frontier model benchmarks

OpenAI Codex Thursday & developer agent updates

Open-source model releases, architectures & research

LeRobot Humanoid open-source bipedal robot

AI economics, tokenomics & compute scarcity

Cybersecurity vulnerabilities & threat bulletin