AI Wire · Saturday, May 23, 2026

Cybersecurity threats and vulnerabilities

A heavy security news day. Megalodon pushed malicious CI/CD workflows to 5,561 GitHub repos in six hours using throwaway accounts and forged CI bot names to exfiltrate CI secrets, cloud creds, SSH keys, OIDC tokens, and source code (@thehackersnews). CISA added a Drupal Core SQL injection (CVE-2026-9082) to its KEV catalog after Imperva observed 15,000+ attempts against nearly 6,000 sites in 65 countries, with gaming and financial services taking nearly half (@thehackersnews). A CVSS 10.0 LiteSpeed cPanel plugin flaw (CVE-2026-48172) is also being actively exploited, letting any cPanel user execute arbitrary scripts as root on v2.3–2.4.4 (@thehackersnews). r/pwnhub's roundup corroborates the broader supply-chain pressure, flagging Microsoft Exchange and Cisco SD-WAN Controller exploitation in the same window (last30days, reddit.com).

New research showed many "hardware-bound" vulnerable Windows drivers can actually be triggered purely from user mode, expanding the BYOVD surface for EDR-killing (@thehackersnews). Ghostwriter is phishing Ukraine's government with Prometheus-themed PDF lures dropping OYSTERFRESH→OYSTERBLUES/OYSTERSHUCK (@thehackersnews). Law enforcement clawed back two big ones: First VPN — used by 25+ ransomware groups — was dismantled, and a 23-year-old Canadian was arrested for running Kimwolf, a DDoS botnet that peaked at 31.4 Tbps from photo frames and webcams (@thehackersnews).

Anthropic's Project Glasswing surfaced more than 10,000 high/critical vulns in essential software in its first month, with an explicit warning that the industry will struggle to absorb what Claude Mythos Preview can find (@anthropicai). Mitchell Hashimoto pushed back on the framing — OSS maintainers aren't a supply chain and shouldn't carry CVE-monitoring obligations (@steipete RT).

Codex updates and high-speed coding agents

OpenAI shipped Codex "Appshots" — Command-Command on Mac sends the active app window's screenshot plus offscreen text (scrolled-out Google Doc content, etc.) into a Codex thread (@gdb, @steipete). Anthropic countered with Claude auto mode landing on the Pro plan with Sonnet 4.6 alongside Opus 4.7 (@claudedevs).

The bigger story is speed. Cerebras-hosted Codex Spark generates code at 1,200 tokens/sec — roughly 20x Sonnet/Opus — and @aidotengineer's playbook from @MilksandMatcha argues the slow-gen habits will start shipping bad code 20x faster unless workflows adapt: five sub-agents producing 15 variations each, plus continuous linting, diff review, and refactoring after every task because at that throughput they're effectively free (@aidotengineer). Reese Levine and team at UCSC also landed full WebGPU support in llama.cpp/ggml after ~18 months (@huggingface RT). @swyx demoed Codex driving Chrome to train a ~10.6M-param transformer in free Colab — 19-min run, 99/100 on random checks — with sub-agents auditing the result. Locally, r/LocalLLaMA's top thread of the month showed Qwen 3.6 27B with MTP hitting ~78 tps on an RTX Pro 6000, finally making 262k-context local agentic coding viable (last30days, reddit.com).

Frontier model competition and evaluation gripes

Alibaba's Qwen team claimed Qwen 3.7-Max beat Opus 4.7 and GPT-5.5 on an agentic Tetris self-training task — +56% improvement at $1.32 vs Opus's +28% at $12.15 and GPT-5.5's +7% at $2.85 (@alibaba_qwen). Take with vendor-benchmark salt, but the price/perf gap is striking. DHH and Greg Brockman both said GPT-5.5 has caught up on complicated agent work — DHH calls reverting to Opus 4.7 "a big step backwards" (@gdb, @steipete RT). r/singularity's top model thread of the month had GPT-5.5 narrowly beating Mythos on a multi-step cyber-attack sim, completing a 12-hour human task in 11 minutes for $1.73 (last30days, reddit.com).

Eval discourse is louder than the leaderboard. Jeremy Howard called Gemini Flash 3.5 "disappointing" — fast and smart but trained to max evals rather than be helpful (@jeremyphoward). Philipp Schmid countered with third-party evals showing Flash 3.5 doing well on agents/coding/vision/finance and asked for failure cases (@_philschmid). Gary Marcus hammered OpenAI on transparency around the Erdős-problems claim — how many were tried, whether GPT-5.5 was trained on the newly-discovered counterexample, what differs, and how much compute — arguing IPO timing makes opacity worse (@garymarcus). Will McGugan's complaint that Claude Code's text wrapping has been broken for weeks (off-by-one) sharpened the point (@jeremyphoward RT).

Hugging Face, open models, and the open ecosystem

CommonCrawl's April 2026 crawl plus URL index — 2.19 billion pages — now lives in HF Storage Buckets, queryable with DuckDB over hf:// with zero download; Daniel van Strien counted all 2.19B rows in ~35s (@huggingface). Clement Delangue noted CommonCrawl explicitly recommending HF Buckets for evolving training datasets (@clementdelangue). Microsoft Azure Foundry and HF are co-hosting Stability's SDXL, Black Forest Labs' FLUX.1-schnell, and Tongyi-MAI's Z-Image-Turbo (@huggingface RT).

AllenAI's ArtifactLinker predicts which HF-hosted benchmarks a given model would set SOTA on, then runs the evals to verify — directly attacking the "most models are evaluated on a fraction of benchmarks" problem (@clementdelangue). Cormac Boyle showed Function Gemma at 270M params hitting 46% on app intents out-of-the-box and 90%+ on 8/10 functions after synthetic fine-tuning, running ~2,000 tps prefill on a Pixel 7 (@swyx RT). Singapore's Foreign Minister Vivian Balakrishnan built his daily agent on a three-year-old Raspberry Pi with 8GB RAM (@aidotengineer RT). OpenMed crossed 5,000 contributors on open clinical AI (@clementdelangue), and @_akhaliq added follow-up/predecessor banners on paper pages (e.g. DINOv2, SAM-3).

Google DeepMind launches: Gemini, SynthID, Project Genie

SynthID's imperceptible watermark is expanding to more partners, and both the Gemini app and Google Search now answer "was this AI-generated?" queries (@googledeepmind). DeepMind is expanding its Singapore partnership around scientific discovery, pandemic preparedness, and healthcare (@googledeepmind). Project Genie now ingests Maps Street View imagery — Google AI Ultra subscribers (18+) globally can pick a U.S. location in Google Labs and explore a generated interactive world (@googledeepmind).

Ethan Mollick showed why Gemini Omni's native multimodal video editing matters: he took the 1896 "train arriving" film and edited it into a bullet train, a LEGO version, added a time traveler, a centipede, and Muppets — with reflections preserved (@emollick). Philipp Schmid's Google I/O talk demoed Gemini Managed Agents plus the new Interactions API, which gives an agent its own hosted Linux sandbox to execute code and manage memory in a single API call (@_philschmid).

AI policy, safety, immigration and societal impact

David Sacks, Elon Musk, and other tech leaders persuaded Trump not to sign a highly-anticipated AI executive order, hours before he was scheduled to (@esyudkowsky citing WaPo reporting). Will Rinehart filed DOJ/FTC comments proposing an AI-safety antitrust safe harbor — citing the chilling effect on deeper OpenAI/Anthropic collaboration after last summer's joint eval, with pricing/customers/commercialization kept out of scope (@tszzl RT). Roon flagged the administration closing the loophole letting O1-visa researchers stay in the U.S. while awaiting green cards — practically forcing OpenAI's extraordinary-ability researchers to leave and wait years abroad (@tszzl).

Gary Marcus surfaced an unverified secondhand account claiming OpenAI is paying NYC households to install 360-degree cameras throughout their homes for chore/training data, with workers rotating SD cards (@garymarcus). He also argued OpenAI's IPO trajectory is weaker than Anthropic's and that critics aren't being met in moderated debate (@garymarcus). Eliezer Yudkowsky argued AI labs would have to filter training data far more broadly than a single opt-out string to avoid behavior-shaping leakage (@esyudkowsky). On architecture, BlinkDL noted Gated DeltaNet-2 is nearly identical to RWKV-7's DPLR recurrence without acknowledgment (@jeremyphoward RT). John Burn-Murdoch resurfaced the multiplicative fertility math: 90% of women having children at a 2.2 average yields a 1.98 TFR — below replacement (@jburnmurdoch).

The Bottom Line

It was a security-heavy day with two simultaneously active CVSS-10-class exploits, a 5,561-repo GitHub poisoning campaign, and Anthropic publicly framing the volume problem its vuln-hunting models will create for downstreams. The coding-agent stack pulled apart along two axes — Appshot-style context capture and Cerebras-speed parallelism — while frontier-model discourse cooled on Anthropic and warmed on GPT-5.5 and Qwen. Underneath, the policy fight is shifting from voluntary safety toward antitrust scaffolding, even as the executive-order route stalled.

Dispatch № 30 · Filed Saturday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.