Quanta Magazine: The AI revolution in math has arrived
·Quanta Magazine
Quanta's feature documents the last 12 months as a phase change in AI-assisted mathematics: Google DeepMind's AlphaProof + AlphaGeometry 2 hit Olympiad silver in July 2025; by Q1 2026 their successor 'AlphaMath' ranked 4th in the Putnam Competition under exam conditions (117/120, median human 2/120); Terence Tao's Lean-project collaborations produced the first formally verified resolution of a Bourbaki-listed open problem (a 1953 conjecture on symmetric diophantine equations) using a DeepMind-trained proof search on 1024 TPU v5 chips for 11 days. Quanta's interviewed mathematicians (Tao, Scholze, Gowers) describe a shift from 'helpful assistant' to 'research collaborator that occasionally finds the key idea'. Author: Alex Wilkins.
MathematicsAlphaProofDeepMindLeanTaoResearch
Why it matters
Formal mathematics has always been the hardest test of reasoning — unlike chess or Go, there is no reward model ambiguity and proofs are terminally verifiable. An LLM-based system hitting 4th at Putnam and closing a 70-year-old Bourbaki problem means the techniques transfer to any domain with a machine-checkable correctness oracle: program synthesis, chip verification, theorem-driven security proofs. Practically, Lean proof engineering becomes the next bottleneck career inside AI labs, and open-source proof corpora (mathlib, Isabelle AFP) become strategic data assets.
Impact scorecard
7.49/10
Stakes
7.5
Novelty
8.0
Authority
8.5
Coverage
6.5
Concreteness
8.0
Social
8.5
FUD risk
2.5
Coverage12 outlets · 4 tier-1
Quanta, Nature News, MIT Tech Review, The Guardian, Ars Technica
Quanta is tier-1 science journalism with track record for careful sourcing. The Putnam 4th-place claim is reconstructible from the published DeepMind technical report; the formal Bourbaki proof is on the Lean mathlib commit log. Named mathematicians (Tao, Scholze, Gowers) are quoted on-record. FUD risk minimal — these results are falsifiable by inspecting the Lean code.
OpenAI publishes 'Codex for almost everything', a major capability expansion for its Codex coding agent. The post details how Codex can now handle a far broader range of software engineering tasks end-to-end, including autonomous debugging and deployment steps. A companion demo 'Codex Hacked a Samsung TV' shows the agent autonomously reverse-engineering and exploiting a consumer device — drawing 100+ HN points. HN main thread: 874 pts, 449 comments on launch day.
A developer reports a €54,000 unexpected billing spike in just 13 hours after a Firebase browser key without API restrictions was used to make Gemini API requests — presumably by a malicious third party. The Google AI developer forum post goes viral with 386 HN pts and 281 comments. The incident exposes a critical gap in Google's abuse detection and billing caps for Gemini APIs: client-side Firebase keys often have no restrictions by default, and Gemini does not enforce spending caps out of the box.
Alibaba's Qwen team releases Qwen3.6-35B-A3B as fully open-source on HuggingFace (Apache license). The model uses a Mixture-of-Experts architecture with 35B total parameters but only 3B active per token — making it runnable on consumer hardware. Simon Willison's post 'Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7' lands 404 HN pts and 84 comments, while the original release thread hits 100+ on r/LocalLLaMA. Pitched as 'agentic coding power, now open to all.'