← Back to feed
Research

Tencent Robotics X releases HY-Embodied-0.5: 2B + 32B open-source embodied AI foundation models, leads 16 of 22 benchmarks

Tencent Robotics X and HY Vision Team released HY-Embodied-0.5, a family of open-source foundation models built for real-world robotic agents (arXiv 2604.07430). The 2B model targets edge devices and leads state-of-the-art alternatives on 16 of 22 benchmarks; the 32B variant matches Gemini 3.0 Pro on embodied understanding tasks. Both use a Mixture-of-Transformers (MoT) architecture with modality-specific computing paths and latent tokens for fine-grained visual perception — critical for manipulation and navigation. A VLA (Vision-Language-Action) model trained on this foundation enables real-world robot control. Full code and model weights are public on April 7, 2026.

embodied-airoboticsfoundation-modeltencentmixture-of-transformersopen-sourcevla

Why it matters

Embodied AI has been bottlenecked by the lack of open foundation models trained specifically for physical interaction — most robotics work fine-tunes LLMs not designed for the task. HY-Embodied-0.5's MoT architecture and open weights lower the barrier for robotics labs globally. Matching Gemini 3.0 Pro with an open 32B model signals the frontier of embodied AI is becoming accessible.

Impact scorecard

7.76/10
Stakes
8.0
Novelty
8.0
Authority
8.0
Coverage
6.0
Concreteness
8.0
Social
8.0
FUD risk
2.0
Coverage5 outlets · 0 tier-1
arXiv, Hugging Face Papers (182 upvotes), GitHub
Reddit0 upvotes
r/MachineLearning
r/MachineLearning, r/LocalLLaMA

Trust check

high

arXiv preprint (2604.07430, April 7 2026) from Tencent Robotics X — credible industrial research lab with prior publications. 182 Hugging Face upvotes in 10 days, open weights on Hugging Face Hub for independent verification. Benchmark claims are specific and reproducible.

Primary source ↗