Tencent Robotics X releases HY-Embodied-0.5: 2B + 32B open-source embodied AI foundation models, leads 16 of 22 benchmarks
·arXiv / Tencent Robotics X
Tencent Robotics X and HY Vision Team released HY-Embodied-0.5, a family of open-source foundation models built for real-world robotic agents (arXiv 2604.07430). The 2B model targets edge devices and leads state-of-the-art alternatives on 16 of 22 benchmarks; the 32B variant matches Gemini 3.0 Pro on embodied understanding tasks. Both use a Mixture-of-Transformers (MoT) architecture with modality-specific computing paths and latent tokens for fine-grained visual perception — critical for manipulation and navigation. A VLA (Vision-Language-Action) model trained on this foundation enables real-world robot control. Full code and model weights are public on April 7, 2026.
Embodied AI has been bottlenecked by the lack of open foundation models trained specifically for physical interaction — most robotics work fine-tunes LLMs not designed for the task. HY-Embodied-0.5's MoT architecture and open weights lower the barrier for robotics labs globally. Matching Gemini 3.0 Pro with an open 32B model signals the frontier of embodied AI is becoming accessible.
Impact scorecard
7.76/10
Stakes
8.0
Novelty
8.0
Authority
8.0
Coverage
6.0
Concreteness
8.0
Social
8.0
FUD risk
2.0
Coverage5 outlets · 0 tier-1
arXiv, Hugging Face Papers (182 upvotes), GitHub
Reddit0 upvotes r/MachineLearning
r/MachineLearning, r/LocalLLaMA
Trust check
high
arXiv preprint (2604.07430, April 7 2026) from Tencent Robotics X — credible industrial research lab with prior publications. 182 Hugging Face upvotes in 10 days, open weights on Hugging Face Hub for independent verification. Benchmark claims are specific and reproducible.
Stanford's 2026 AI Index, released April 16, shows China has nearly erased the US lead: the top model Arena score gap collapsed from 1300+ points (May 2023) to just 39 (March 2026). Meanwhile, AI scholars emigrating to the US dropped 89% since 2017, accelerating 80% in the past year. China leads in industrial robot installations (295,000 vs 34,200 US) and AI research citations (20.6% vs 12.6%). US private AI investment reached $285.9B in 2025 vs China's $12.4B — but the money gap isn't translating into capability dominance.
SkillClaw (arXiv 2604.08377, April 9 2026) introduces a framework where deployed LLM agent skills improve themselves by aggregating real interactions across all users simultaneously. An autonomous evolver identifies recurring behavioral patterns in cross-user trajectories, distills improvements, and propagates them system-wide — so a fix discovered in one user's session benefits everyone. Evaluated on WildClawBench with Qwen3-Max, the system shows measurable gains from limited interaction data. The paper attracted 276 upvotes on Hugging Face in its first week — among the highest engagement of any April 2026 AI paper.
NousResearch's hermes-agent (github.com/NousResearch/hermes-agent) has reached 96,216 stars and 13,481 forks, ranking among the most-starred AI agent projects on GitHub. Unlike most frameworks, it implements a persistent learning loop: the agent creates skills from experience, refines them during use, and searches its own conversation history. It runs across 200+ models (OpenRouter, OpenAI, Anthropic, custom), deploys via Terminal, Telegram, Discord, Slack, WhatsApp or Email, and operates on a $5 VPS or GPU cluster. It includes a cron scheduler for unattended operation and batch trajectory generation for RL training.