How important is this news?

Composite impact score: 7.9/10. Breakdown — Stakes 9, Novelty 9, Authority 8, Coverage 5.5, Concreteness 9, Social 7, FUD risk 2.

← Back to feed

Research

MirrorCode: Claude Opus 4.6 reimplemented a 16000-line Go bioinformatics toolkit that would take humans 2-17 weeks

Q: Can you trust this reporting on MirrorCode: Claude Opus 4.6 reimplemented a 16000-line Go bioinformatics toolkit that would take humans 2-17 weeks?

Trust verdict: high. Primary-source newsletter from Jack Clark (Anthropic policy lead) summarizing real papers by METR, Epoch AI and Google DeepMind, with named authors (David Krueger, Ryan Greenblatt). Specific numeric claims (16,000 lines, 2-17 weeks, 15% to 30%) are verifiable in the linked research.

Apr 18, 2026 · jack-clark.net

METR and Epoch AI released MirrorCode, a benchmark that tests whether AI can autonomously reimplement complex real-world software from specification. The headline result: Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with roughly 16000 lines of Go and 40+ commands — an effort estimated to take a human engineer 2 to 17 weeks. The benchmark spans 20+ programs across Unix utilities, cryptography and compression. The release also previews a Google DeepMind taxonomy of six attack genres on AI agents (content injection, semantic manipulation, cognitive state, behavioral control, systemic, human-in-the-loop) and Ryan Greenblatt's revised estimate that full AI R&D automation by end-2028 now has 30% probability, up from 15%, citing verifiable-software-task self-improvement loops.

metrepoch-aiclaudebenchmarkai-safety

Why it matters

MirrorCode is the first benchmark that operationalizes 'software self-reimplementation' as an R&D-automation proxy — and Claude Opus 4.6 just cleared it on a 16,000-line codebase. If the 30% by end-2028 estimate calibrates even directionally, two things shift: recursive self-improvement loops stop being theoretical and start being a scheduling risk for every lab, and the DeepMind six-attack taxonomy becomes the de-facto agent threat model that enterprises must test against. Expect regulated buyers (banks, defense) to demand MirrorCode-style evals in procurement by Q3.

Impact scorecard

7.9/10

Stakes

9.0

Novelty

9.0

Authority

8.0

Coverage

5.5

Concreteness

9.0

Social

7.0

FUD risk

2.0

Coverage8 outlets · 1 tier-1

Import AI, METR Blog, Epoch AI, MarkTechPost

X / Twitter5,600 mentions
@jackclarkSF · 6,200 likes
@METR_Evals · 3,100 likes

Reddit540 upvotes
r/MachineLearning

r/MachineLearning, r/ControlProblem

Trust check

high

Primary-source newsletter from Jack Clark (Anthropic policy lead) summarizing real papers by METR, Epoch AI and Google DeepMind, with named authors (David Krueger, Ryan Greenblatt). Specific numeric claims (16,000 lines, 2-17 weeks, 15% to 30%) are verifiable in the linked research.

Primary source ↗

Keep reading

Startups

Apr 18, 2026 · TechCrunch

Converge Bio raises $25M Series A — 4.5x protein yields, Bessemer + Meta/OpenAI/Wiz execs back it

Impact 7.1/10 Trust · high 📰 12 outlets · 🐦 1,800 · 👽 r/startups · 420

Converge Bio closed a $25M oversubscribed Series A led by Bessemer Venture Partners with TLV Partners, Saras Capital and Vintage Investment Partners participating; execs from Meta, OpenAI and Wiz joined as individual LPs. The company builds generative models trained on DNA, RNA and protein sequence data, with three commercial systems: antibody design, protein yield optimization, and biomarker/target discovery. Traction: 40+ programs with over a dozen pharma/biotech customers across the US, Canada, Europe and Israel, expanding into Asia. Case studies include a 4-to-4.5x protein-yield boost in a single computational pass, and antibodies with single-nanomolar binding affinity. Headcount grew from 9 in Nov 2024 to 34 today. Prior seed was $5.5M in 2024. CEO Dov Gertz founded the company.

Research

Apr 18, 2026 · HuggingFace

NVIDIA's Nemotron OCR v2: 28x faster than PaddleOCR, trained on 12.2M synthetic images, 34.7 pages/sec

Impact 7.6/10 Trust · high 📰 11 outlets · 🐦 4,200 · 👽 r/MachineLearning · 780

NVIDIA released Nemotron OCR v2 on April 17 — an 84M-parameter unified multilingual OCR model trained primarily on 12.2 million synthetic images generated via a modified SynthDoG pipeline, plus ~680K real-world scans. It handles English, Simplified and Traditional Chinese, Japanese, Korean and Russian in a single model (no language detection needed). On OmniDocBench it processes 34.7 pages per second on a single A100 — 28x faster than PaddleOCR v5's server mode at 1.2 pages/s — while holding competitive normalized-edit-distance accuracy. On the SynthDoG multilingual benchmark it dominates: 0.046 NED in Japanese vs v1's 0.723, 0.047 in Korean vs v1's 0.923. Weights and the training dataset are public under NVIDIA Open Model License and CC-BY-4.0.

Apr 18, 2026 · Mistral AI

Mistral opens Studio to MCPs — 20+ enterprise connectors (Databricks, Snowflake, Stripe, Zapier) and custom servers

Impact 7.3/10 Trust · high 📰 14 outlets · 🐦 3,200 · 👽 r/LocalLLaMA · 620

Mistral shipped MCP support inside Studio on April 16, giving developers both pre-configured connectors and the ability to point agents at any remote MCP server. Built-in connectors cover GitHub, Gmail and web search out of the box, and Mistral now hosts a directory of 20+ secure enterprise connectors spanning data, productivity, development and commerce — Databricks, Snowflake, Atlassian, Asana, Outlook, Box, Stripe, Zapier and more. Custom MCPs are wired through API/SDK with direct tool calling and human-in-the-loop approval gates. All connectors work across model calls and agent calls, with programmatic CRUD over the connector inventory.