AI Wire · Thursday, June 4, 2026

Gemma 4 12B Launch

Google DeepMind shipped Gemma 4 12B, a unified encoder-free multimodal model that pipes audio, image, and video straight into the LLM with no separate towers — 256K context, built-in thinking, native tool calling, and Apache 2.0 weights (@googledeepmind, @_philschmid). The pitch is laptop-grade intelligence: 16GB of memory is enough to run it, and benchmarks reportedly land near a 26B-class model (@_philschmid, last30days, arstechnica.com).

Day-zero distribution was unusually broad. vLLM landed reasoning, tool, vision, and audio parsers behind its OpenAI-compatible API (@vllm_project), Ollama wired it up via MLX with one-line launchers for Hermes and Claude Code (@ollama), and Red Hat OpenShift AI had it serving on day one (@_akhaliq). Hugging Face highlighted dense-12B unified multimodality with MTP assistants for faster decode (@huggingface), and GGUF quants appeared the same day (last30days, huggingface.co). Coverage in Ars and a visual-guide newsletter framed the release as Google's biggest open-weights moment in over a year (last30days, blog.google; last30days, newsletter.maartengrootendorst.com).

Cybersecurity Vulnerabilities & Active Exploits

It was an unusually heavy disclosure day. A leftover debug flag ("FlagLeft") in Microsoft 365 Android apps let any co-resident app silently lift signed-in account tokens across Word, Excel, PowerPoint, Copilot, Loop, and OneNote — now patched (@thehackersnews). CISA added CVE-2026-45247, a 9.8 CVSS unauthenticated PHP RCE in Mirasvit Cache Warmer for Magento, to KEV with a June 6 patch deadline (@thehackersnews). A new HTTP/2 memory-bomb technique can have a single client pin 32GB on NGINX, Apache HTTPD, IIS, Envoy, or Cloudflare Pingora in roughly 20 seconds via near-empty HPACK headers and zero flow-control windows; only NGINX 1.29.8+ and Apache mod_http2 v2.0.41 have fixes so far (@thehackersnews).

The supply-chain and client side were equally rough: an unpatched Windows search: URI handler leaks NTLMv2 hashes on a single click (@thehackersnews); a one-click VS Code webview flaw can exfiltrate GitHub OAuth tokens (@thehackersnews); a Linux CIFS-client/cifs-utils interaction yields root with public PoC (@thehackersnews); poisoned WhatsApp/Slack/SMS notifications can hijack Gemini on Android without any malicious app (@thehackersnews); and a malicious npm package quietly stole non-expiring Codex refresh tokens since v0.1.82 (@thehackersnews). Notably, an autonomous AI tool found a Redis RCE (CVE-2026-23479) that hid for over two years (@thehackersnews), and Anthropic published an analysis of 832 malicious accounts mapped to threat-actor TTPs (@anthropicai).

Open Multimodal Models & Local AI

Gemma was the headline, but the open-weights wave around it was the story. Ideogram 4.0 dropped as downloadable weights claiming SOTA on realism and text rendering, immediately live on fal (@_akhaliq, @fal). NVIDIA Cosmos 3 topped seven physical-AI leaderboards across world generation, robot policy, and industrial vision (@_akhaliq, @nvidia). Google released Magenta RealTime 2 for on-device continuous music with ~200ms latency (@_akhaliq), MOSS-Audio hit #1 trending on Hugging Face as a unified speech/sound/music model (@_akhaliq), and Jeremy Howard released vui, a 300M context-aware TTS running on a single consumer GPU (@jeremyphoward). fal added TripoSplat for sub-5-second 3D Gaussian generation (@fal), and Hugging Face launched LeLab, a no-terminal GUI for LeRobot (@huggingface).

AI Economics, IPO Skepticism & Cost Pullback

Sam Altman said AI budgeting "never came up" earlier this year but is now a "huge issue," with OpenAI's top internal user burning 100B tokens a month — up from 100K six years ago (@garymarcus). Gary Marcus framed this as the death of all-you-can-eat tokenmaxxing, arguing hyperscalers could not afford pay-per-use before IPOs and that LLMs may never be meaningfully profitable outside the chip layer (@garymarcus). Bain joined MIT and McKinsey in flagging disappointing corporate AI returns, though Ethan Mollick pushed back that the Bain report is "super odd & opaque" about what it measures and that broader surveys still show positive ROI (@emollick, @garymarcus). Public sentiment shifted 49 points against data centers in nine months (@garymarcus).

Hybrid Routing & Open-Source Replacing Frontier Models

Harvey, Hugging Face, and Fireworks published results showing GLM 5.1 as primary worker routing to Opus 4.7 as advisor only 0.83 times per task beat pure Opus on a hard legal benchmark — 18% vs 14% all-pass at $368 vs $954 (@clementdelangue). OpenRouter's new Pareto Code router routes by minimum coding score to the cheapest qualifying model and is now processing nearly 1B tokens/day, with the Auto Router at 12B (@openrouter). Ramp data shows DeepSeek as one of its fastest-growing vendors as companies tolerate China-hosted inference for cost (@arakharazian). Clem Delangue's framing: "smart routing beat brute force… using the most expensive model for every task is a laziness tax" (@clementdelangue).

Agent Tooling, Enterprise Stack & Governance

Town AI launched out of beta with a $55M Series A led by a16z's Alex Rampell, pitching a cross-app assistant that "already knows you" instead of one you configure (@swyx). OpenAI previewed GPT-Rosalind upgrades for life-sciences workflows (@gdb) and proposed frontier-AI governance ideas following the cyber EO (@gdb). Peter Steinberger's OpenClaw shipped a Skill Workshop that turns silent agent self-edits into reviewable proposals (@steipete), Anthropic published business-analytics agent best practices (@claudedevs), and Jensen Huang outlined the enterprise agent stack — models, orchestration, tools, secure runtime — at MS Build with Satya Nadella (@nvidia, @swyx). Ethan Mollick noted Claude Mythos already hit the 3–4 hour METR 80% task horizon that superforecasters expected only by year-end (@emollick).

The Bottom Line

The day was a one-two punch of open-weights momentum (Gemma 4 12B plus Ideogram, Cosmos 3, Magenta, MOSS-Audio) and hardening economic skepticism, with hybrid routing studies giving CFOs concrete ammo to push back on frontier-only spend. Underneath, a brutal security disclosure cycle — FlagLeft, Magento KEV, HTTP/2 memory bomb, Gemini Android prompt injection, Codex npm theft — landed on the same day vendors were celebrating capability launches.

Dispatch № 40 · Filed Thursday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Gemma 4 12B Launch

Cybersecurity Vulnerabilities & Active Exploits

Open Multimodal Models & Local AI

AI Economics, IPO Skepticism & Cost Pullback

Hybrid Routing & Open-Source Replacing Frontier Models

Agent Tooling, Enterprise Stack & Governance

The Bottom Line

Sources

Gemma 4 12B Launch

Cybersecurity Vulnerabilities & Active Exploits

Open Multimodal Models & Local AI

AI Economics, IPO Skepticism & Cost Pullback

Hybrid Routing & Open-Source Replacing Frontier Models

Agent Tooling, Enterprise Stack & Governance