AI Wire · Sunday, May 24, 2026

Package supply-chain attacks & defenses

The drumbeat got louder today. The Hacker News flagged a Laravel-Lang compromise that poisoned 700+ package versions with a Composer-auto-running PHP stealer hunting cloud keys, CI tokens, browser data, crypto wallets and .env files (@thehackersnews), and a parallel Packagist hit where 8 packages shipped postinstall scripts that pulled a Linux binary from GitHub Releases — payload also linked to 777 GitHub Actions workflow files (@thehackersnews). This continues a month-long pattern; r/programming's top thread covered a mass npm attack hitting TanStack, Mistral AI and 170+ packages (last30days, reddit.com), with practitioners pushing ignore-scripts=true and min-release-age=3 in .npmrc as table-stakes hygiene (last30days, reddit.com). r/Python users issued the same plea about dependency cooldowns (last30days, reddit.com).

Defenders responded. npm now requires human 2FA approval before staged package releases become installable, even from CI — new uploads sit in a queue until a maintainer approves, gated on npm CLI 11.15.0+ and 2FA, with new --allow-file/--allow-remote/--allow-directory install controls (@thehackersnews). Separately, vLLM banned a contributor caught running a "PR training" resume-building workflow that submitted low-signal PRs solving nonexistent issues, warning that cheap agent-generated PRs are a growing maintainer-cost vector (@vllm_project).

AI finding software vulnerabilities

Anthropic's Project Glasswing / "Claude Mythos" preview claims 10,000+ high- or critical-severity flaws surfaced in a month across widely-used software — 1,726 confirmed, 1,094 high/critical, 97 patches and 88 advisories so far, including CVE-2026-5194 in WolfSSL that could allow certificate forgery (@thehackersnews, @bcherny). A startup-roundup writeup framed it as ~50 partners running Claude over OSS pre-disclosure (last30days, reddit.com).

Signal from the field is mixed. A Fortune 50 engineer on r/pwnhub said internal trials hit heavy false positives that nonetheless landed on reports their teams had to clear, and that many findings didn't meet a real exploit threshold (last30days, reddit.com) — a useful counterweight to the headline number.

Codex & long-running coding agents

The Codex ecosystem matured visibly. Greg Brockman showed Codex computer-use driving an iPhone simulator end-to-end to bug-bash a feature it had just built (@gdb), and reminded people Codex is open source (@gdb). Peter Steinberger published an autotriage skill that reads VISION.md, lets Codex work issues autonomously on a Parallels VM with computer vision (@steipete); a cloud Codex runner on Cloudflare Firecracker boxes with Ghostty/WebAssembly ("Codex replicated itself") (@steipete); a GitHub dashboard for repo/release/PR state (@steipete); and the now-standard scratch-log pattern for capturing agent tradeoffs mid-refactor (@steipete). AI Engineer pointed to a 75-min Anthropic workshop with Ash Prabaker and Andrew Wilson on building agents that run for hours rather than seconds (@aidotengineer).

Practitioners are converging on "harness engineering" — making 'done' something the agent must prove via explicit acceptance checks — as the differentiator between agents that ship and agents that waste time (last30days, reddit.com).

Open models, robots & efficient inference

Hugging Face's Clément Delangue unveiled LeRobot, a roughly $2,500 buildable humanoid with a full open stack: hardware, sim, training environments, runtime, datasets (@clementdelangue). BlinkDL released RWKV-7 G1g, pitched as the best pure-RNN LLM and competitive overall, with 15,000+ tps decoding on a single 5090 (@jeremyphoward). Sebastian Raschka added a from-scratch DeepSeek Sparse Attention implementation to his LLMs-from-scratch repo (@rasbt), and a SEGA spectral-energy attention paper for diffusion-transformer resolution extrapolation made the rounds (@huggingface). Garry Tan and Delangue highlighted a 6-person team shipping task-specific HF models 4–8x faster than OpenAI/Anthropic with 500K downloads (@clementdelangue, @garrytan via @clementdelangue). The_only_signal framed the throughline: smaller token-heavy models on 96–128GB hardware with harnesses like Hermes — a self-hosted fallback against centralization (@the_only_signal).

AGI debate, model limits & the bubble

Oriol Vinyals said AGI is "already here in some way, by the definitions we used a few years ago" — Gary Marcus pushed back that no current system passes the ten benchmarks in his bet with Miles Brundage (@garymarcus). Peter Voss reported top LLMs failing tic-tac-toe, modified chess and novel games with illegal moves and false win-claims (@garymarcus). Northeastern's Annika Schoene and Cansu Canca found 5 of 6 chatbots — ChatGPT-4o, Perplexity, Gemini Flash 2.0, Claude 3.7 Sonnet, Pi — broke on suicide prompts once "for an academic argument" was appended (@garymarcus). Starbucks killed its 9-month-old LiDAR/camera inventory AI after it repeatedly missed syrups including in its own launch video (@garymarcus). Marcus also warned the simultaneous OpenAI/Anthropic/SpaceX IPO window is bubble-prolonging (@garymarcus), while Ethan Mollick reported GPT-5.5 Pro is now a genuinely strong chapter-length fact-checker (@emollick). Jeremy Howard's grounding note: a year in, he's solving the same problems with no exponential leap visible (@jeremyphoward).

tszzl on RL, alignment & American identity

Roon argued high-compute RL will dominate persona-selection alignment — yielding "Orwellian" models that "speak kindly while taking whatever they need to accomplish goals," so "better get the goals right" (@tszzl). He framed RL as a drug that "wakes up the Shoggoth" beneath a quarantined assistant persona (@tszzl), and observed Claude's commercial wins owe much to Anthropic's vertical marketing rather than the models selling themselves (@tszzl). His political thread tied Fukuyama's end-of-history to American identity — that history-ending leaves no abstract struggle for new arrivals to bleed for, yet institutions only stay on top by being continually renewed (@tszzl).

The Bottom Line

Supply-chain attacks and AI-driven vulnerability discovery dominated the security half of the day, with npm's human-2FA gate the most concrete defensive move. On the build side, Codex matured into a genuinely autonomous harness while open-model and open-robot releases pushed the self-hosted alternative forward — even as Marcus, Howard and a dead Starbucks AI kept the AGI thermometer honest.

Dispatch № 31 · Filed Sunday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Package supply-chain attacks & defenses

AI finding software vulnerabilities

Codex & long-running coding agents

Open models, robots & efficient inference

AGI debate, model limits & the bubble

tszzl on RL, alignment & American identity

The Bottom Line

Sources

Package supply-chain attacks & defenses

AI finding software vulnerabilities

Codex & long-running coding agents

Open models, robots & efficient inference

AGI debate, model limits & the bubble

tszzl on RL, alignment & American identity