← Back to feed
Research

NVIDIA's Nemotron OCR v2: 28x faster than PaddleOCR, trained on 12.2M synthetic images, 34.7 pages/sec

NVIDIA released Nemotron OCR v2 on April 17 — an 84M-parameter unified multilingual OCR model trained primarily on 12.2 million synthetic images generated via a modified SynthDoG pipeline, plus ~680K real-world scans. It handles English, Simplified and Traditional Chinese, Japanese, Korean and Russian in a single model (no language detection needed). On OmniDocBench it processes 34.7 pages per second on a single A100 — 28x faster than PaddleOCR v5's server mode at 1.2 pages/s — while holding competitive normalized-edit-distance accuracy. On the SynthDoG multilingual benchmark it dominates: 0.046 NED in Japanese vs v1's 0.723, 0.047 in Korean vs v1's 0.923. Weights and the training dataset are public under NVIDIA Open Model License and CC-BY-4.0.

nvidiaocrsynthetic-datahuggingfacemultilingual

Why it matters

Nemotron OCR v2 is the clearest demonstration yet that synthetic-data pipelines can beat heavily curated real-world datasets on a classic vision benchmark. A 28x speedup plus open weights means every downstream product doing document ingestion (legal, finance, healthcare, back-office RPA) now has a credible alternative to paid OCR APIs. Longer term, the SynthDoG recipe generalizes: if a 12.2M-sample synthetic pipeline produced this on OCR, expect the same pattern to reshape speech recognition, medical imaging and code-screenshot comprehension within 6 months.

Impact scorecard

7.6/10
Stakes
7.5
Novelty
8.0
Authority
9.0
Coverage
6.0
Concreteness
9.5
Social
7.0
FUD risk
2.0
Coverage11 outlets · 1 tier-1
HuggingFace, NVIDIA Blog, MarkTechPost, VentureBeat
X / Twitter4,200 mentions
@huggingface · 3,800 likes
Reddit780 upvotes
r/MachineLearning
r/MachineLearning, r/LocalLLaMA

Trust check

high

First-party NVIDIA + HuggingFace release with full benchmark tables, public model weights, public dataset. License and authors named. Reproducible; no FUD flags.

Primary source ↗