AI Wire

Open-weight model deluge: Gemma 4 QAT, Nemotron 3 Ultra & more

This week was, as Victor Mustar put it, "one of the most INSANE week[s] ever for open AI," with 25+ notable open-weight drops across every modality (@_akhaliq, @huggingface). The headliner was Google's Gemma 4 Quantization-Aware Training checkpoints, which Hugging Face and Unsloth say cut memory usage by roughly 3× while preserving near-original quality — Gemma 4 26B-A4B now fits on 16GB RAM (@huggingface). Ollama shipped QAT variants from E2B up to 31B the same day (@ollama), and vLLM is Google's recommended serving engine for them (@vllm_project).

On the frontier-scale side, NVIDIA's Nemotron 3 Ultra — a 550B hybrid Mamba-MoE with 55B active params, 1M context, and a claimed MMLU of 89.1 — is the first openly-weighted 550B hybrid Mamba-Transformer, with NVFP4 variants promising ~5× throughput on Blackwell (@_akhaliq). Red Hat AI followed with FP8 and W4A16 quantized checkpoints ready to serve in vLLM (@vllm_project). Ideogram also published a technical writeup of Ideogram 4.0, a 9.3B Diffusion Transformer paired with a frozen 8B VLM text encoder; its nf4 checkpoint runs on a single 24GB consumer GPU (@clementdelangue, @_akhaliq).

OpenAI seeks US government stake, AI stocks tank

Reports that the Trump administration is discussing a possible government equity stake in OpenAI sparked a sharp selloff, with NVDA down 6.2%, AVGO -7.92%, CoreWeave -7.07%, Nebius -12.27%, and Oracle -9.59% on the day (@garymarcus). Gary Marcus framed the rumor as "a huge sign of weakness" for OpenAI and warned taxpayers should not be asked to bail out a company he characterized as lacking a realistic road to profitability (@garymarcus).

The geopolitical read was equally pointed: Marcus argued a US-government-owned American lab would be distrusted abroad the way Washington distrusts Huawei, handing momentum to Mistral and other sovereign-AI rivals (@garymarcus). The backdrop made Anthropic's Wall Street Journal call for top labs to weigh pausing development land awkwardly — Peter Steinberger quipped that "asking your competitors to pause development right after you file your S-1 is the single most effective moat-building exercise I've seen pitched as ethics" (@steipete).

Security onslaught: supply-chain attacks, AI-found zero-days, and Project Glasswing

It was a brutal day for software supply chains. The Hacker News flagged a Claude Code agent-mode flaw where a single crafted GitHub issue was treated as a trusted command, leaking OIDC workflow credentials replayable for repo write access — patched in v1.0.94 (@thehackersnews). A self-replicating worm dubbed Miasma reportedly darkened 73 Microsoft GitHub repos, including Azure and MicrosoftDocs assets, while a parallel wave used 50+ poisoned npm packages to spread the Rust-based IronWorm stealer (@thehackersnews).

Network gear was hit too: Cisco SD-WAN's CVE-2026-20245 is under active exploitation with no patch, and a separate Cisco Unified CM bug allows unauthenticated arbitrary file write leading to root, with a public PoC (@thehackersnews). On the defense side, "AI just found 21 zero-days in FFmpeg," some sitting untouched for 15–20 years, and Chrome shipped a record 429 patches (@thehackersnews). MongoDB joined Anthropic's Project Glasswing alongside Apple, Google, Microsoft, and NVIDIA to harden critical software for the AI era (@mongodb).

Coding agents, dev tools, and agent engineering practice

Practitioners traded craft tips: swyx argued that appending "?" to prompts — framing tasks as questions — beats "always use plan mode," since it invites the model to push back rather than blindly execute (@swyx). Clement Delangue published Hugging Face benchmark data (~1,000 graded Claude Code and Codex runs) arguing token costs mean there will be no "SaaS apocalypse" — agents using cached dev tools like the hf CLI beat agents hitting raw APIs (@clementdelangue). Ethan Mollick highlighted Anthropic's new Agent-Teams-vs-Workflows decision chart, while noting the AI often picks combinations itself (@emollick).

Tooling kept shipping: Codex gained a Build iOS Apps plugin with in-app SwiftUI previews and hot reload (@steipete), and Boris Cherny announced Claude Cowork doubled 5-hour usage limits for the next month (@bcherny). The week's open-source flashpoint was Ladybird closing public PRs — Charlie Marsh said "the dynamics of open source are changing rapidly," while Steinberger countered that the answer is "using more agents to maintain it," not going closed (@steipete).

AI compute as strategic commodity and sovereign AI

SpaceX disclosed a Cloud Service Agreement under which Google will pay it about $920M/month — roughly $11B/year — for compute at xAI data centers, which Gary Marcus called fresh evidence that "AI compute is becoming a strategic commodity like launch capacity or energy" (@garymarcus). His sharper jab: "If scale was 'all you need,' Elon would be hoarding LLMs, not leasing them" (@garymarcus).

Sovereign-AI moved from theory to shipping. NVIDIA spotlighted Sarvam AI's "Made in India" stack training 100B+ MoE models across 4,096+ H100 GPUs and delivering millisecond multilingual voice inference for Aadhaar, KYC, and telephony for Tata Capital and Infosys (@nvidia). Hugging Face's Julien Chaumond reminded teams HF storage and egress now undercut S3, GCS, and Backblaze for AI data at scale (@clementdelangue, @_akhaliq).

Product launches: ChatGPT email, Riverflow 2.5, and Anthropic science

Anthropic's science team showed Opus 4.7 matching — and on some tasks beating — dedicated NMR spectroscopy software for molecular structure work (@anthropicai). ChatGPT on the web can now draft and send emails directly from writing blocks without leaving the conversation (@gdb). OpenRouter launched Riverflow 2.5 from Riverflow AI — an image model with a user-controlled scoring rubric and tunable reasoning effort, free through June 9 (@openrouter).

OpenAI is staffing up a London pretraining team led by Nikolay Savinov, focused on long-context work (@sama). Meta's SAM 3D took a Best Paper Honorable Mention at CVPR 2026 (@aiatmeta), and Google Magenta's MRT2 real-time music model is now playable in-browser on Hugging Face Spaces and ported to transformers (@huggingface).

The Bottom Line

The day's center of gravity was open weights getting dramatically cheaper to run (Gemma 4 QAT on 16GB, Nemotron 3 Ultra quantized for vLLM) just as the closed-frontier business model wobbled under rumors of a US government stake in OpenAI. Underneath both stories, supply-chain security degraded sharply — prompt-injection in Claude Code, npm worms, Cisco zero-days — while compute economics (SpaceX leasing GPUs to Google, Sarvam scaling on 4,096 H100s) confirmed that infrastructure, not model novelty, is now the binding constraint.

Dispatch № 42 · Filed Saturday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Open-weight model deluge: Gemma 4 QAT, Nemotron 3 Ultra & more

OpenAI seeks US government stake, AI stocks tank

Security onslaught: supply-chain attacks, AI-found zero-days, and Project Glasswing

Coding agents, dev tools, and agent engineering practice

AI compute as strategic commodity and sovereign AI

Product launches: ChatGPT email, Riverflow 2.5, and Anthropic science

The Bottom Line

Sources

Open-weight model deluge

OpenAI seeks US government stake, AI stocks tank

Security onslaught

Coding agents and dev tools

AI compute as strategic commodity

Product launches