Anthropic brings "advisor strategy" to Claude Platform: Opus advises Sonnet/Haiku at inference
·X · @claudeai
Anthropic announced the advisor strategy on the Claude Platform: pair Opus 4.6 as a planning/critique advisor with Sonnet 4.6 or Haiku 4.5 as the executing model. The advisor inspects partial outputs, suggests corrections and redirects the executor mid-generation. On SWE-bench Multilingual, Sonnet+Opus-advisor scores 2.7 percentage points higher than Sonnet alone, at roughly 1.3x the cost vs 7x the cost of running Opus end-to-end. General availability today via the Claude Console and CLI; pricing is existing Claude API rates for both models (no advisor premium). Anthropic positions this as the first first-class multi-model inference primitive in any frontier-lab API — not just routing or cascading but explicit advisor/executor roles with shared context.
AnthropicClaudeOpusSonnetSWE-benchMulti-Model
Why it matters
Advisor-mode is the first API-level primitive for multi-model inference at a frontier lab — and it's interesting because the economics finally make sense. 2.7pp on SWE-bench Multilingual for 1.3x cost (vs 7x for pure Opus) is exactly the kind of unit economics that lets enterprise buyers say yes. Expect OpenAI and DeepMind to fast-follow with analogous APIs within 90 days; expect evals to shift toward reporting advised-vs-unadvised numbers separately. Longer term, this normalizes a pattern where models are graded per-dollar rather than per-token, which is what the enterprise market actually wants.
Impact scorecard
7.06/10
Stakes
7.0
Novelty
7.5
Authority
9.0
Coverage
5.5
Concreteness
8.5
Social
8.0
FUD risk
2.0
Coverage10 outlets · 3 tier-1
@AnthropicAI, The Verge, TechCrunch, The Information, SemiAnalysis, Latent Space podcast
Anthropic is a primary vendor source announcing its own product, so the facts (availability, pricing model, advisor/executor architecture) are high-confidence. The 2.7pp SWE-bench delta is vendor-reported — credible but not independently replicated yet; published methodology on Anthropic's blog. Low FUD risk but watch for independent eval teams (Latent Space, Artificial Analysis) confirming or contradicting the numbers in the next 2 weeks.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.