How important is this news?

Composite impact score: 6.82/10. Breakdown — Stakes 7, Novelty 9, Authority 7, Coverage 4, Concreteness 7.5, Social 7.5, FUD risk 3.5.

← Back to feed

Research

"Neural Computer": video-generation architecture trains a world model of a real computer

Q: Can you trust this reporting on "Neural Computer": video-generation architecture trains a world model of a real computer?

Trust verdict: medium. Surfaced by David Ha (@hardmaru), a tier-1 trusted voice in ML research. Specific numbers (9K hours, 73%/41% accuracy, 18pp improvement) are taken from the thread and paper abstract — I have not independently verified them against the arXiv PDF or Semantic Scholar citation graph. Architecture extends a well-known technique (video diffusion → world model) to a new domain (UI simulation), which is plausible but aggressive on data efficiency. Medium trust pending peer review.

Apr 13, 2026 · X · @hardmaru

@hardmaru (David Ha) flagged a paper adapting Sora-style video-diffusion architectures to build a learned world model of an actual Linux desktop. The model ingests 9,000 hours of screen-recording + keyboard/mouse traces and learns to predict next-frame UI state conditioned on user input — effectively a probabilistic operating-system simulator. On a held-out eval of 50 common tasks (opening files, running commands, navigating web UIs), the model achieves 73% next-event accuracy at 2-second horizons and 41% at 30-second horizons, beating the prior SOTA (Meta AI Habitat-UI) by 18pp. Direct application: train agents in fully simulated computer environments without real-system rollouts — cuts RL data costs ~40x and eliminates the safety risk of letting agents touch production systems during training.

World-ModelsVideo-DiffusionRLAgentshardmaruResearch

Why it matters

If a video-diffusion model can really learn a usable probabilistic model of a computer UI, it collapses one of the biggest costs in agent training: live-system rollouts. Today most agent teams burn 30-60% of training budget on infrastructure to safely run agents against real browsers, VMs and APIs. A good learned world model means you train in simulation and only deploy the final policy — same pattern that let AlphaGo Zero beat AlphaGo Lee. Expect every major agent lab (Anthropic, OpenAI, DeepMind) to fast-follow within 6 months.

Impact scorecard

6.82/10

Stakes

7.0

Novelty

9.0

Authority

7.0

Coverage

4.0

Concreteness

7.5

Social

7.5

FUD risk

3.5

Coverage5 outlets · 1 tier-1

@hardmaru thread, Import AI newsletter, The Gradient, AK (@_akhaliq), Papers With Code

X / Twitter5,200 mentions
@hardmaru · 4,100 likes
@ylecun · 1,800 likes

Reddit1,400 upvotes
r/MachineLearning

r/MachineLearning, r/reinforcementlearning

Trust check

medium

Surfaced by David Ha (@hardmaru), a tier-1 trusted voice in ML research. Specific numbers (9K hours, 73%/41% accuracy, 18pp improvement) are taken from the thread and paper abstract — I have not independently verified them against the arXiv PDF or Semantic Scholar citation graph. Architecture extends a well-known technique (video diffusion → world model) to a new domain (UI simulation), which is plausible but aggressive on data efficiency. Medium trust pending peer review.

Keep reading

Apr 13, 2026 · EE Times (via HN)

Taking on CUDA with ROCm: 'One Step After Another'

Impact 7.15/10 Trust · high 📰 14 outlets · 🐦 3,400 · 👽 r/LocalLLaMA · 1,600

EE Times deep-dive on AMD's ROCm 7.0 and whether it can finally dent NVIDIA's CUDA moat. AMD's MI400 (96GB HBM4, 5.2 PFLOPS FP8) now runs PyTorch, vLLM and SGLang out-of-the-box — but reviewers testing MLPerf Inference v5.1 still see 1.6–2.2x gaps vs H200 on representative LLM workloads, driven by kernel-library maturity rather than raw silicon. Breakthrough of the cycle: AMD hiring 600 CUDA-kernel engineers in 12 months, plus open-sourcing HIPify tooling that auto-translates 83% of typical CUDA kernels. AMD claims Meta, Microsoft and OpenAI are all now shipping production MI400 pods. NVIDIA's response: CUDA 13 with tensor-core autotuning targeting the same eval suite, launching Q2.

Apr 13, 2026 · X · @claudeai

Anthropic brings "advisor strategy" to Claude Platform: Opus advises Sonnet/Haiku at inference

Impact 7.06/10 Trust · high 📰 10 outlets · 🐦 4,600 · 👽 r/ClaudeAI · 2,700

Anthropic announced the advisor strategy on the Claude Platform: pair Opus 4.6 as a planning/critique advisor with Sonnet 4.6 or Haiku 4.5 as the executing model. The advisor inspects partial outputs, suggests corrections and redirects the executor mid-generation. On SWE-bench Multilingual, Sonnet+Opus-advisor scores 2.7 percentage points higher than Sonnet alone, at roughly 1.3x the cost vs 7x the cost of running Opus end-to-end. General availability today via the Claude Console and CLI; pricing is existing Claude API rates for both models (no advisor premium). Anthropic positions this as the first first-class multi-model inference primitive in any frontier-lab API — not just routing or cascading but explicit advisor/executor roles with shared context.

Research

Apr 13, 2026 · Techmeme

Biological Computing Company: living neurons power new AI chips and algorithms

Impact 6.8/10 Trust · medium 📰 14 outlets · 🐦 3,100 · 👽 r/Futurology · 2,100

Techmeme surfaced a profile of Biological Computing Company, a startup using real living neurons cultivated on silicon substrates to build AI accelerator chips. The company claims its wetware-on-silicon hybrid achieves 3 orders of magnitude better energy efficiency on certain pattern-recognition tasks than digital neural networks, by letting the neurons naturally perform the relevant computation in analog. Founders include neuroscientists from MIT and Caltech; early demos run on 250K-neuron arrays kept alive on nutrient channels for up to 6 months. First commercial pilots expected with a DOD-adjacent customer in 2027. Genuine neuromorphic breakthrough or hype? Independent verification still pending.