Berkeley SPEX: GPT-4o mini fails 92% of trolley problems — replacing 4 words reduces failure to near zero
·Berkeley BAIR
Researchers at UC Berkeley (Landon Butler, Justin Singh Kang, Yigit Efe Erginbas, Abhineet Agarwal, Bin Yu, Kannan Ramchandran) published SPEX on March 13, 2026 — a signal-processing + coding-theory approach that scales LLM feature-interaction discovery from dozens to thousands of components. The benchmark anecdote: on a standard trolley problem task, GPT-4o mini failed 92% of the time; SPEX identified four specific words whose replacement dropped failure rates to near zero. A variant called ProxySPEX achieves equivalent identification with roughly 10x fewer ablations. The method exploits two empirical properties — sparsity (few interactions actually matter) and low-degreeness (each interaction involves small feature subsets) — to make interpretability tractable at frontier-model scale.
berkeleyinterpretabilityllmmmluablations
Why it matters
Interpretability at scale has been stuck — most methods work on dozens of features or break on frontier-size models. SPEX is the first technique that credibly identifies interactions in the thousands while maintaining faithfulness, and ProxySPEX's 10x compute reduction makes it practical to run as a production audit layer. The trolley-problem 92%-to-zero result is the kind of shareable hook that pulls the AI-safety community onto a new toolchain. Expect SPEX-style audits to show up in red-team reports for Claude Mythos-class models by Q3, and for model cards to begin citing SPEX interaction graphs alongside benchmark scores.
Impact scorecard
7.6/10
Stakes
8.0
Novelty
9.0
Authority
9.0
Coverage
5.5
Concreteness
9.0
Social
6.5
FUD risk
2.0
Coverage8 outlets · 1 tier-1
Berkeley BAIR, The Gradient, Import AI, MarkTechPost
X / Twitter3,400 mentions @binyu_stats · 2,200 likes
Reddit820 upvotes r/MachineLearning
r/MachineLearning
Trust check
high
Primary-source Berkeley BAIR blog with named academic authors (Bin Yu is an NAS-elected statistician, Kannan Ramchandran is an IEEE Fellow). Reproducible via code release. No FUD flags.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.