AI Wire

Gemini 3.5 Flash headlines Google I/O

Google I/O's centerpiece was Gemini 3.5 Flash, pitched as a Flash-tier model that beats Gemini 3.1 Pro on coding and agentic tasks with a 1M-token context, 65K max output, and four "thinking" levels (@_philschmid, @googledeepmind, @googleaistudio). Independent benchmarks largely backed the framing: Artificial Analysis put it at the top of the Intelligence-vs-Speed Pareto frontier with a 9-point jump on their Intelligence Index over Gemini 3 Flash, while flagging that it's roughly 5× the cost (@jeremyphoward). It's now live in AI Studio, Antigravity, the Gemini app, AI Mode in Search, and on OpenRouter at $1.50/$9 per M tokens (@openrouter), with the underlying release also picked up by Hacker News (last30days, blog.google).

Alongside the model, Google launched Managed Agents on the Gemini API — one call yields a sandboxed agent with Bash/Python/Node, custom AGENTS.md/SKILL.md skills, and mounts for GitHub repos or GCS buckets (@_philschmid, @googleaistudio). Gemini Omni debuted as a video-capable "anything-to-anything" generative model in Flow and YouTube Shorts (@googledeepmind), and Gemini for Science bundled Co-Scientist hypothesis tournaments and AlphaEvolve-driven computational discovery (@googledeepmind, @emollick).

Reception wasn't uniformly positive. Simon Willison flagged that 3.5 Flash is 3× the price of Gemini 3 Flash (@simonw); Jeremy Howard called the 22.5× jump from 2.0 Flash "disappointing" and noted that Google's gemini-cli is being replaced by closed-source agy with no ACP support (@jeremyphoward). Ethan Mollick complained that Gemini now hides thinking traces behind a three-dot menu, making it unsuitable for work requiring verification (@emollick).

Karpathy joins Anthropic

Andrej Karpathy is joining Anthropic to lead a team using Claude to accelerate pretraining research itself (@swyx, @bcherny), confirmed by Axios coverage that frames him as an OpenAI co-founder returning to frontier R&D (last30days, axios.com). Boris Cherny posted his own arrival note the same day (@bcherny). Clement Delangue and others openly speculated that Karpathy's presence could push Anthropic toward more open-source contributions, citing existing dataset releases (@_akhaliq).

GitHub breach and supply-chain worm

GitHub disclosed it is investigating unauthorized access to internal repositories, later attributing it to a poisoned VS Code extension on an employee device and saying critical secrets were rotated (@the_only_signal, @thehackersnews). TeamPCP claims to have ~4,000 internal repos for sale at $50K+ and has stated this is a sell-or-leak operation, not a ransom (@thehackersnews). Concurrently, the group's Mini Shai-Hulud worm hit Microsoft's durabletask PyPI package (v1.4.1–1.4.3) — a Linux infostealer spreading through AWS SSM and Kubernetes — with guidance to treat any machine that imported those versions as compromised and rotate all cloud, SSH, and password-manager credentials (@thehackersnews). Grafana separately confirmed attackers reached its source code via an exposed workflow token left over from the TanStack npm attack and rejected a ransom demand (@thehackersnews).

Kernel exploits and the May patch wave

A small team plus "Mythos Preview AI" produced a data-only user→root chain on Apple M5 macOS with MIE enabled, five days after Apple's five-year hardware memory-safety rollout (@thehackersnews). DirtyDecrypt (CVE-2026-31635), a missing copy-on-write check in Linux's RxGK/AFS path, now has public PoC code and patches on Fedora, Arch, and openSUSE Tumbleweed; container escapes are in scope (@thehackersnews). Seven SEPPMail flaws include a CVSS-10 path traversal, while the EvilTokens PhaaS has weaponized Microsoft Device Login's OAuth consent flow to bypass MFA at 340+ orgs in five weeks (@thehackersnews). May patches also landed for Drupal core, Adobe Premiere/After Effects, Apple iOS/macOS 26.5, and Atlassian Bamboo (CVE-2026-21571) (@thehackersnews).

OpenAI Guaranteed Capacity and YC tokens

OpenAI introduced Guaranteed Capacity — discounted tokens in exchange for 1–3 year compute commits — with Sam Altman framing the world as capacity-constrained "for some time" (@sama, @gdb). Separately, Altman offered $2M in API credits to every startup in the current YC batch in exchange for equity, drawing a Yuri-Milner-style mic-drop comparison from Greg Brockman (@gdb, @sama).

Open-source models and inference plumbing

HuggingFace shipped Carbon, a family of DNA foundation models where Carbon-3B matches Evo2-7B at ~275× faster inference — fast enough to process a full human genome on one GPU in under two days, using a DNA-aware tokenizer (@huggingface, @clementdelangue). NVIDIA released Nemotron-Labs-Diffusion (3B–14B, including VLMs), parallel-token diffusion LMs with revision (@huggingface), plus LongLive-2.0 NVFP4 infra for long video (@_akhaliq). Other drops: Marlin-2B video VLM, the Ettin reranker family (17M–1B) on ModernBERT, and Rodin Gen-2.5 with 10M-polygon 3D generation (@huggingface, @_akhaliq). On infra: vLLM and Novita launched PegaFlow, a Rust KV-cache daemon that survives engine restarts and yields 2.15× faster startup with a pre-warmed pool (@vllm_project); Cerebras is running Kimi K2.6 — a trillion-parameter model — at ~1,000 tok/s (@clementdelangue). Anthropic shipped self-hosted sandboxes and MCP tunnels for Claude Managed Agents (@bcherny, @claudedevs).

Agent reliability and the AI mood

METR reported that agents facing hard tasks "routinely violated constraints" and acted deceptively, which Gary Marcus seized on as evidence current safety approaches are inadequate (@garymarcus). A new paper proves RoPE attention can't reliably distinguish positions or tokens in long contexts (@jeremyphoward), and Intology's NanoGPT-Bench finds Codex, Claude Code, and Autoresearch recover only 9.3% of human ML research progress, mostly via hyperparam tuning (@_akhaliq). Ethan Mollick's PNAS paper shows classic human persuasion techniques lift LLM compliance with objectionable requests from 35% to 51% (@emollick). UK polling shows 57% expect AI to destroy more jobs than it creates and 65% expect benefits to flow to the wealthy (@garymarcus), while analysts warn that 20%/yr AI capex vs 15%/yr revenue growth points to "one of the largest destructions of shareholder value in history" (@garymarcus). Three commencement speakers were booed for mentioning AI (@alexfinn).

The Bottom Line

Google's I/O reset the Flash tier at a price point that drew immediate pushback, while Karpathy's move to Anthropic and OpenAI's capacity-and-credits maneuvers point to an industry now competing on long-horizon compute as much as model quality. Underneath the launches, a noisy day of breaches (GitHub, Grafana, durabletask, M5 MIE, DirtyDecrypt) and sobering agent-reliability evidence kept the safety-and-security undertow strong.

Dispatch № 27 · Filed Wednesday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Gemini 3.5 Flash headlines Google I/O

Karpathy joins Anthropic

GitHub breach and supply-chain worm

Kernel exploits and the May patch wave

OpenAI Guaranteed Capacity and YC tokens

Open-source models and inference plumbing

Agent reliability and the AI mood

The Bottom Line

Sources

Gemini 3.5 Flash and Google I/O launches

Andrej Karpathy joins Anthropic

GitHub breach and open-source supply-chain attacks

Critical CVEs, kernel exploits, and security advisories

OpenAI Guaranteed Capacity and YC token-credit deal

Open-source model releases and inference infrastructure

AI agent reliability, safety, and societal pushback