Quanta Magazine: The AI revolution in math has arrived
·Quanta Magazine
Quanta's feature documents the last 12 months as a phase change in AI-assisted mathematics: Google DeepMind's AlphaProof + AlphaGeometry 2 hit Olympiad silver in July 2025; by Q1 2026 their successor 'AlphaMath' ranked 4th in the Putnam Competition under exam conditions (117/120, median human 2/120); Terence Tao's Lean-project collaborations produced the first formally verified resolution of a Bourbaki-listed open problem (a 1953 conjecture on symmetric diophantine equations) using a DeepMind-trained proof search on 1024 TPU v5 chips for 11 days. Quanta's interviewed mathematicians (Tao, Scholze, Gowers) describe a shift from 'helpful assistant' to 'research collaborator that occasionally finds the key idea'. Author: Alex Wilkins.
MathematicsAlphaProofDeepMindLeanTaoResearch
Why it matters
Formal mathematics has always been the hardest test of reasoning — unlike chess or Go, there is no reward model ambiguity and proofs are terminally verifiable. An LLM-based system hitting 4th at Putnam and closing a 70-year-old Bourbaki problem means the techniques transfer to any domain with a machine-checkable correctness oracle: program synthesis, chip verification, theorem-driven security proofs. Practically, Lean proof engineering becomes the next bottleneck career inside AI labs, and open-source proof corpora (mathlib, Isabelle AFP) become strategic data assets.
Impact scorecard
7.49/10
Stakes
7.5
Novelty
8.0
Authority
8.5
Coverage
6.5
Concreteness
8.0
Social
8.5
FUD risk
2.5
Coverage12 outlets · 4 tier-1
Quanta, Nature News, MIT Tech Review, The Guardian, Ars Technica
Quanta is tier-1 science journalism with track record for careful sourcing. The Putnam 4th-place claim is reconstructible from the published DeepMind technical report; the formal Bourbaki proof is on the Lean mathlib commit log. Named mathematicians (Tao, Scholze, Gowers) are quoted on-record. FUD risk minimal — these results are falsifiable by inspecting the Lean code.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.