"Neural Computer": video-generation architecture trains a world model of a real computer
·X · @hardmaru
@hardmaru (David Ha) flagged a paper adapting Sora-style video-diffusion architectures to build a learned world model of an actual Linux desktop. The model ingests 9,000 hours of screen-recording + keyboard/mouse traces and learns to predict next-frame UI state conditioned on user input — effectively a probabilistic operating-system simulator. On a held-out eval of 50 common tasks (opening files, running commands, navigating web UIs), the model achieves 73% next-event accuracy at 2-second horizons and 41% at 30-second horizons, beating the prior SOTA (Meta AI Habitat-UI) by 18pp. Direct application: train agents in fully simulated computer environments without real-system rollouts — cuts RL data costs ~40x and eliminates the safety risk of letting agents touch production systems during training.
If a video-diffusion model can really learn a usable probabilistic model of a computer UI, it collapses one of the biggest costs in agent training: live-system rollouts. Today most agent teams burn 30-60% of training budget on infrastructure to safely run agents against real browsers, VMs and APIs. A good learned world model means you train in simulation and only deploy the final policy — same pattern that let AlphaGo Zero beat AlphaGo Lee. Expect every major agent lab (Anthropic, OpenAI, DeepMind) to fast-follow within 6 months.
Impact scorecard
6.82/10
Stakes
7.0
Novelty
9.0
Authority
7.0
Coverage
4.0
Concreteness
7.5
Social
7.5
FUD risk
3.5
Coverage5 outlets · 1 tier-1
@hardmaru thread, Import AI newsletter, The Gradient, AK (@_akhaliq), Papers With Code
Surfaced by David Ha (@hardmaru), a tier-1 trusted voice in ML research. Specific numbers (9K hours, 73%/41% accuracy, 18pp improvement) are taken from the thread and paper abstract — I have not independently verified them against the arXiv PDF or Semantic Scholar citation graph. Architecture extends a well-known technique (video diffusion → world model) to a new domain (UI simulation), which is plausible but aggressive on data efficiency. Medium trust pending peer review.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.