Tencent Robotics X releases HY-Embodied-0.5: 2B + 32B open-source embodied AI foundation models, leads 16 of 22 benchmarks
·arXiv / Tencent Robotics X
Tencent Robotics X and HY Vision Team released HY-Embodied-0.5, a family of open-source foundation models built for real-world robotic agents (arXiv 2604.07430). The 2B model targets edge devices and leads state-of-the-art alternatives on 16 of 22 benchmarks; the 32B variant matches Gemini 3.0 Pro on embodied understanding tasks. Both use a Mixture-of-Transformers (MoT) architecture with modality-specific computing paths and latent tokens for fine-grained visual perception — critical for manipulation and navigation. A VLA (Vision-Language-Action) model trained on this foundation enables real-world robot control. Full code and model weights are public on April 7, 2026.
Embodied AI has been bottlenecked by the lack of open foundation models trained specifically for physical interaction — most robotics work fine-tunes LLMs not designed for the task. HY-Embodied-0.5's MoT architecture and open weights lower the barrier for robotics labs globally. Matching Gemini 3.0 Pro with an open 32B model signals the frontier of embodied AI is becoming accessible.
Impact scorecard
7.76/10
Stakes
8.0
Novelty
8.0
Authority
8.0
Coverage
6.0
Concreteness
8.0
Social
8.0
FUD risk
2.0
Coverage5 outlets · 0 tier-1
arXiv, Hugging Face Papers (182 upvotes), GitHub
Reddit0 upvotes r/MachineLearning
r/MachineLearning, r/LocalLLaMA
Trust check
high
arXiv preprint (2604.07430, April 7 2026) from Tencent Robotics X — credible industrial research lab with prior publications. 182 Hugging Face upvotes in 10 days, open weights on Hugging Face Hub for independent verification. Benchmark claims are specific and reproducible.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.