← Back to feed
Research

OpenBMB releases VoxCPM2 — 2B-param tokenizer-free TTS, 1.84% WER in English, 30 languages, Apache 2.0

OpenBMB released VoxCPM2 in April 2026 — a 2B-parameter speech synthesis model trained on over 2 million hours of multilingual audio and released under Apache 2.0. Unlike standard TTS systems that quantize speech into discrete tokens, VoxCPM2 skips quantization entirely: it generates continuous speech representations through an end-to-end diffusion autoregressive pipeline operating in AudioVAE V2's latent space (LocEnc-TSLM-RALM-LocDiT). Coverage spans 30 languages plus nine Chinese dialects (Cantonese, Sichuan, Wu, Northeast, Henan, Shaanxi, Shandong, Tianjin, Minnan). On Seed-TTS-eval English it hits 1.84% WER with 75.3% speaker similarity; on CV3-eval multilingual it logs 3.65% CER in Chinese and 5.00% WER in English across 11 tested languages.

openbmbttsdiffusionopen-sourcemultilingual

Why it matters

VoxCPM2 lands four days after Gemini 3.1 Flash TTS and sets the Apache-2.0 open-weight bar where Google set the hosted-API bar. A 2B tokenizer-free diffusion-AR architecture is the clearest departure from the Encodec-style discrete-token pipeline that has dominated TTS since 2022, and 1.84% WER puts the model within reach of closed competitors on English. For any team that was blocked on ElevenLabs pricing or Google's licensing, VoxCPM2 is the first serious multilingual open alternative — expect a wave of self-hosted voice deployments in customer support, accessibility and audiobook production within Q2.

Impact scorecard

7.7/10
Stakes
7.5
Novelty
8.5
Authority
8.0
Coverage
6.0
Concreteness
9.5
Social
7.5
FUD risk
2.0
Coverage9 outlets · 1 tier-1
GitHub Trending, HuggingFace, MarkTechPost
X / Twitter3,200 mentions
@OpenBMB · 2,600 likes
Reddit680 upvotes
r/LocalLLaMA
r/MachineLearning, r/LocalLLaMA

Trust check

high

Weights and training methodology public on GitHub; benchmark numbers reproducible on Seed-TTS-eval and CV3-eval harnesses. Apache 2.0 license verifiable. No FUD flags; open-weight releases are self-authenticating.

Primary source ↗