Taking on CUDA with ROCm: 'One Step After Another'
·EE Times (via HN)
EE Times deep-dive on AMD's ROCm 7.0 and whether it can finally dent NVIDIA's CUDA moat. AMD's MI400 (96GB HBM4, 5.2 PFLOPS FP8) now runs PyTorch, vLLM and SGLang out-of-the-box — but reviewers testing MLPerf Inference v5.1 still see 1.6–2.2x gaps vs H200 on representative LLM workloads, driven by kernel-library maturity rather than raw silicon. Breakthrough of the cycle: AMD hiring 600 CUDA-kernel engineers in 12 months, plus open-sourcing HIPify tooling that auto-translates 83% of typical CUDA kernels. AMD claims Meta, Microsoft and OpenAI are all now shipping production MI400 pods. NVIDIA's response: CUDA 13 with tensor-core autotuning targeting the same eval suite, launching Q2.
AMDROCmNVIDIACUDAMI400PyTorchMLPerf
Why it matters
For the first time since CUDA hit critical mass in 2016, there is a credible second source for frontier-scale AI training. A 1.6–2.2x MLPerf gap at 60% the price per GPU is economically defensible for hyperscalers with their own kernel teams — which is exactly who is buying MI400. This matters for anyone forecasting AI infra spend: if AMD share of AI accelerators moves from <5% to 20% by end-2027, NVIDIA's pricing power and datacenter-gross-margin mix materially changes, and so does the $800B AI capex cycle that's underwriting NVDA's valuation.
Impact scorecard
7.15/10
Stakes
8.0
Novelty
6.5
Authority
7.5
Coverage
6.5
Concreteness
8.5
Social
7.0
FUD risk
2.5
Coverage14 outlets · 4 tier-1
EE Times, Tom's Hardware, ServeTheHome, HPCwire, SemiAnalysis, The Register
X / Twitter3,400 mentions @dylan522p · 2,900 likes
Reddit1,600 upvotes r/LocalLLaMA
r/LocalLLaMA, r/MachineLearning, r/hardware
Trust check
high
EE Times is tier-1 trade press with direct vendor access; MLPerf numbers come from published v5.1 results not vendor claims. Meta/Microsoft/OpenAI production deployments are vendor-stated but consistent with reporting from SemiAnalysis and The Information throughout Q1. Concrete numbers, multiple primary sources — high trust.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.