NVIDIA's Nemotron OCR v2: 28x faster than PaddleOCR, trained on 12.2M synthetic images, 34.7 pages/sec
·HuggingFace
NVIDIA released Nemotron OCR v2 on April 17 — an 84M-parameter unified multilingual OCR model trained primarily on 12.2 million synthetic images generated via a modified SynthDoG pipeline, plus ~680K real-world scans. It handles English, Simplified and Traditional Chinese, Japanese, Korean and Russian in a single model (no language detection needed). On OmniDocBench it processes 34.7 pages per second on a single A100 — 28x faster than PaddleOCR v5's server mode at 1.2 pages/s — while holding competitive normalized-edit-distance accuracy. On the SynthDoG multilingual benchmark it dominates: 0.046 NED in Japanese vs v1's 0.723, 0.047 in Korean vs v1's 0.923. Weights and the training dataset are public under NVIDIA Open Model License and CC-BY-4.0.
nvidiaocrsynthetic-datahuggingfacemultilingual
Why it matters
Nemotron OCR v2 is the clearest demonstration yet that synthetic-data pipelines can beat heavily curated real-world datasets on a classic vision benchmark. A 28x speedup plus open weights means every downstream product doing document ingestion (legal, finance, healthcare, back-office RPA) now has a credible alternative to paid OCR APIs. Longer term, the SynthDoG recipe generalizes: if a 12.2M-sample synthetic pipeline produced this on OCR, expect the same pattern to reshape speech recognition, medical imaging and code-screenshot comprehension within 6 months.
X / Twitter4,200 mentions @huggingface · 3,800 likes
Reddit780 upvotes r/MachineLearning
r/MachineLearning, r/LocalLLaMA
Trust check
high
First-party NVIDIA + HuggingFace release with full benchmark tables, public model weights, public dataset. License and authors named. Reproducible; no FUD flags.
Converge Bio closed a $25M oversubscribed Series A led by Bessemer Venture Partners with TLV Partners, Saras Capital and Vintage Investment Partners participating; execs from Meta, OpenAI and Wiz joined as individual LPs. The company builds generative models trained on DNA, RNA and protein sequence data, with three commercial systems: antibody design, protein yield optimization, and biomarker/target discovery. Traction: 40+ programs with over a dozen pharma/biotech customers across the US, Canada, Europe and Israel, expanding into Asia. Case studies include a 4-to-4.5x protein-yield boost in a single computational pass, and antibodies with single-nanomolar binding affinity. Headcount grew from 9 in Nov 2024 to 34 today. Prior seed was $5.5M in 2024. CEO Dov Gertz founded the company.
METR and Epoch AI released MirrorCode, a benchmark that tests whether AI can autonomously reimplement complex real-world software from specification. The headline result: Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with roughly 16000 lines of Go and 40+ commands — an effort estimated to take a human engineer 2 to 17 weeks. The benchmark spans 20+ programs across Unix utilities, cryptography and compression. The release also previews a Google DeepMind taxonomy of six attack genres on AI agents (content injection, semantic manipulation, cognitive state, behavioral control, systemic, human-in-the-loop) and Ryan Greenblatt's revised estimate that full AI R&D automation by end-2028 now has 30% probability, up from 15%, citing verifiable-software-task self-improvement loops.
Mistral shipped MCP support inside Studio on April 16, giving developers both pre-configured connectors and the ability to point agents at any remote MCP server. Built-in connectors cover GitHub, Gmail and web search out of the box, and Mistral now hosts a directory of 20+ secure enterprise connectors spanning data, productivity, development and commerce — Databricks, Snowflake, Atlassian, Asana, Outlook, Box, Stripe, Zapier and more. Custom MCPs are wired through API/SDK with direct tool calling and human-in-the-loop approval gates. All connectors work across model calls and agent calls, with programmatic CRUD over the connector inventory.