AI Wire

NVIDIA Nemotron 3 Ultra lands as the new open frontier

NVIDIA shipped Nemotron 3 Ultra today: a 550B-total / 55B-active hybrid Mamba-2-Transformer MoE with a 1M token context, NVFP4 pretraining on 20T tokens, and a LatentMoE architecture aimed squarely at long-running agentic workloads (@nvidia, @_akhaliq). The pitch is efficiency at frontier capability — NVIDIA and Hugging Face are claiming up to 5x faster inference and ~30% lower cost for agentic tasks versus other open frontier models (@huggingface, @clementdelangue). Everything is open under OpenMDW-1.1: base, post-trained and reward checkpoints, NVFP4 quantized versions, training data, and recipes (@_akhaliq).

Distribution was unusually wide on day zero — vLLM had stable-release support (@vllm_project), Ollama exposed it via cloud (@ollama), fal hosted it, and Hugging Face had it on the Hub at launch with day-0 transformers support (@huggingface). The Hacker News post for the model page itself surfaced near the top of the front page (last30days, research.nvidia.com), with parallel discussion of the related Nemotron 3 Super release (last30days, research.nvidia.com). Sebastian Raschka called the capability-to-efficiency ratio "ultra impressive," noting the design carries forward the Mamba-2-attention hybrid and LatentMoE from the prior Super variant (@rasbt).

Anthropic stakes a claim on recursive self-improvement

Anthropic published internal evals arguing Claude is meaningfully accelerating its own development. Their reference test — speeding up training code for a small model — went from Opus 4's ~3x in May 2025 to "Mythos Preview" hitting ~52x; on AI-research next-step decisions, Mythos beat humans 64% of the time, up from 22% in 2024 (@anthropicai). They claim more than 80% of code merged into Anthropic's codebase in May 2026 was authored by Claude, with engineer-per-quarter shipping up 8x since 2021–2025 (@emollick, @jeremyphoward).

Gary Marcus pushed back hard, arguing the results show RSI-as-coding-assistant, not AGI, and that Mythos and Claude Code are neurosymbolic systems whose code-optimization wins don't generalize to the open-ended research judgment AGI would require (@garymarcus). Anthropic itself hedged that it's "not yet clear that Claude is capable of research judgment" but warned the trend could compound alignment risk and "ultimately lead to loss of control" (@anthropicai). Separately, Daniela Amodei tied the firm's confidential IPO filing to the cost of staying on this curve (@garymarcus).

DeepSeek eats into US lab revenue

Cost is now the story. Sam Altman conceded that AI spend went from "an issue that never came up" at the start of 2026 to a top-two customer concern — "my company spent my entire 2026 budget in Q1" (@clementdelangue). DeepSeek has topped OpenRouter's token-share rankings four weeks running (@openrouter), and Lindy publicly migrated 100% of its traffic from Anthropic to DeepSeek v4, reporting both seven-figure savings and a performance increase on core use cases (@clementdelangue). A US business spending tracker now puts DeepSeek atop its "trending" list as enterprises swap out OpenAI and Anthropic (@clementdelangue). Counter-signal: Adam Tooze's chart shows Anthropic adoption has overtaken OpenAI among US businesses overall (@arakharazian).

Physical AI and hardware agents

NVIDIA also unveiled Cosmos 3, billed as the first omni-model for physical AI — text, image, video, sound and action in one architecture, with workflows for robot policy generation and vision-agent scene understanding at CVPR 2026 (@nvidia). On the application side, swyx flagged Flow v3 as an "agentic platform for physical engineering" that pushes requirement changes through CAD and simulation tools (@swyx), and Cognition launched an "AI Productivity Guarantee" funding Devin usage up to $10M if it underdelivers, backed by private enterprise evals stretching to 100 hours versus METR's 16-hour cap (@swyx).

Agent training stack converges

Hugging Face is pulling the agentic stack together: Julien Chaumond launched SynthTraces for generating synthetic coding-agent sessions (@_akhaliq), trl is gaining first-class support for fine-tuning on Claude Code, Codex, OpenClaw, and Pi traces in datasets v5 (@huggingface), and a rebuilt hf CLI detects agent callers and cuts token use by up to 6x on complex Hub tasks vs. curl or the Python SDK (@clementdelangue). NVIDIA dropped a 1,272-record agentic red-teaming dataset for indirect prompt injection across nine enterprise domains (@_akhaliq).

Security: agents become both target and tool

A bad week for agent surface area. The Hacker News reported a Claude Code Action flaw (fixed in v1.0.94) where a crafted GitHub issue could leak OIDC workflow credentials via prompt injection (@thehackersnews), Gemini was shown executing fake commands from WhatsApp/Slack/SMS notifications with no app installed (@thehackersnews), and a malicious link in github-dev could exfiltrate GitHub OAuth tokens for full repo write (@thehackersnews). Inverting the frame: an autonomous AI tool found a critical Redis RCE that had been hidden for over two years (@thehackersnews).

ChatGPT memory and consumer model drops

OpenAI rolled out a new ChatGPT memory system to US Plus/Pro that auto-tracks important details with a user-visible summary and 2x capacity, with a legacy toggle (@openai, @sama, @gdb). Around it: Gemma 4 12B (text+image+audio in a single model, no separate encoders) hit Ollama (@_philschmid, @ollama), Alibaba released Qwen-Image-Flash for few-step distilled image generation/editing (@_akhaliq), and fal added Krea 2 Turbo and Sonilo v1.1 video-to-music (@fal).

The Bottom Line

The day's center of gravity is open, efficient, agent-shaped: Nemotron 3 Ultra and DeepSeek v4 are squeezing US lab margins on cost, while Anthropic counters with a provocative RSI narrative whose substance is still mostly faster coding. Agent infrastructure — traces, CLIs, red-team datasets, physical-engineering platforms — is maturing into a real stack, and the security surface is widening just as fast.

Dispatch № 41 · Filed Friday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

NVIDIA Nemotron 3 Ultra lands as the new open frontier

Anthropic stakes a claim on recursive self-improvement

DeepSeek eats into US lab revenue

Physical AI and hardware agents

Agent training stack converges

Security: agents become both target and tool

ChatGPT memory and consumer model drops

The Bottom Line

Sources

NVIDIA Nemotron 3 Ultra launch

Anthropic recursive self-improvement claims

DeepSeek displaces US labs on cost

Physical AI, robotics & hardware agents

Agentic AI training stack & traces

AI-related security threats & prompt injection

ChatGPT memory upgrade & consumer features