Taking on CUDA with ROCm: 'One Step After Another'
·EE Times (via HN)
EE Times deep-dive on AMD's ROCm 7.0 and whether it can finally dent NVIDIA's CUDA moat. AMD's MI400 (96GB HBM4, 5.2 PFLOPS FP8) now runs PyTorch, vLLM and SGLang out-of-the-box — but reviewers testing MLPerf Inference v5.1 still see 1.6–2.2x gaps vs H200 on representative LLM workloads, driven by kernel-library maturity rather than raw silicon. Breakthrough of the cycle: AMD hiring 600 CUDA-kernel engineers in 12 months, plus open-sourcing HIPify tooling that auto-translates 83% of typical CUDA kernels. AMD claims Meta, Microsoft and OpenAI are all now shipping production MI400 pods. NVIDIA's response: CUDA 13 with tensor-core autotuning targeting the same eval suite, launching Q2.
AMDROCmNVIDIACUDAMI400PyTorchMLPerf
Why it matters
For the first time since CUDA hit critical mass in 2016, there is a credible second source for frontier-scale AI training. A 1.6–2.2x MLPerf gap at 60% the price per GPU is economically defensible for hyperscalers with their own kernel teams — which is exactly who is buying MI400. This matters for anyone forecasting AI infra spend: if AMD share of AI accelerators moves from <5% to 20% by end-2027, NVIDIA's pricing power and datacenter-gross-margin mix materially changes, and so does the $800B AI capex cycle that's underwriting NVDA's valuation.
Impact scorecard
7.15/10
Stakes
8.0
Novelty
6.5
Authority
7.5
Coverage
6.5
Concreteness
8.5
Social
7.0
FUD risk
2.5
Coverage14 outlets · 4 tier-1
EE Times, Tom's Hardware, ServeTheHome, HPCwire, SemiAnalysis, The Register
X / Twitter3,400 mentions @dylan522p · 2,900 likes
Reddit1,600 upvotes r/LocalLLaMA
r/LocalLLaMA, r/MachineLearning, r/hardware
Trust check
high
EE Times is tier-1 trade press with direct vendor access; MLPerf numbers come from published v5.1 results not vendor claims. Meta/Microsoft/OpenAI production deployments are vendor-stated but consistent with reporting from SemiAnalysis and The Information throughout Q1. Concrete numbers, multiple primary sources — high trust.
@hardmaru (David Ha) flagged a paper adapting Sora-style video-diffusion architectures to build a learned world model of an actual Linux desktop. The model ingests 9,000 hours of screen-recording + keyboard/mouse traces and learns to predict next-frame UI state conditioned on user input — effectively a probabilistic operating-system simulator. On a held-out eval of 50 common tasks (opening files, running commands, navigating web UIs), the model achieves 73% next-event accuracy at 2-second horizons and 41% at 30-second horizons, beating the prior SOTA (Meta AI Habitat-UI) by 18pp. Direct application: train agents in fully simulated computer environments without real-system rollouts — cuts RL data costs ~40x and eliminates the safety risk of letting agents touch production systems during training.
Anthropic announced the advisor strategy on the Claude Platform: pair Opus 4.6 as a planning/critique advisor with Sonnet 4.6 or Haiku 4.5 as the executing model. The advisor inspects partial outputs, suggests corrections and redirects the executor mid-generation. On SWE-bench Multilingual, Sonnet+Opus-advisor scores 2.7 percentage points higher than Sonnet alone, at roughly 1.3x the cost vs 7x the cost of running Opus end-to-end. General availability today via the Claude Console and CLI; pricing is existing Claude API rates for both models (no advisor premium). Anthropic positions this as the first first-class multi-model inference primitive in any frontier-lab API — not just routing or cascading but explicit advisor/executor roles with shared context.
Techmeme surfaced a profile of Biological Computing Company, a startup using real living neurons cultivated on silicon substrates to build AI accelerator chips. The company claims its wetware-on-silicon hybrid achieves 3 orders of magnitude better energy efficiency on certain pattern-recognition tasks than digital neural networks, by letting the neurons naturally perform the relevant computation in analog. Founders include neuroscientists from MIT and Caltech; early demos run on 250K-neuron arrays kept alive on nutrient channels for up to 6 months. First commercial pilots expected with a DOD-adjacent customer in 2027. Genuine neuromorphic breakthrough or hype? Independent verification still pending.