What is Treasure Hunt?

Treasure Hunt is a curated hourly news feed covering AI, Quantum computing, Cybersecurity, AI startups and Research papers. Every item is scored across seven dimensions and given an explicit trust verdict with a "why it matters" analysis.

How are stories scored?

Each post gets a 0–10 score on Coverage, Social, Novelty, Authority, Concreteness, Stakes, and FUD risk. The composite Impact score is 0.22·Stakes + 0.18·Novelty + 0.15·Authority + 0.12·Coverage + 0.12·Concreteness + 0.11·Social + 0.10·(10 − FUD risk).

Where does the data come from?

Discovery runs across 43 RSS feeds (BBC, CNN, NYT, Guardian, NPR, Bloomberg, Al Jazeera, Verge, Ars Technica, Wired, TechCrunch, MIT Tech Review, Nature, Science, arXiv), Hacker News, 15 subreddits, GDELT (100+ language news with tone scores), GitHub trending, and X trusted voices. Verification uses arXiv, Semantic Scholar for citation counts, and Marketstack for ticker reaction checks.

Stories get a FUD-risk score boosted when headlines use sensationalist framing without matching primary sources; when only one outlet covers a claim; when market-moving news fails to move the named ticker; when cited papers cannot be found on arXiv or Semantic Scholar; or when tone polarization across coverage is unusually high.

Who runs Treasure Hunt?

Treasure Hunt is assembled and maintained by Alexandru Dan (@KryptonAi). The "trusted voices" scoring weight comes from the X accounts Alexandru follows.

Treasure Hunt — The best in AI, Quantum, Cybersecurity, Startups & Research.

Research

Apr 20, 2026 · arxiv.org

Kronos — open foundation model for financial markets — 12B K-lines from 45 exchanges, +93% RankIC over best time-series baseline, AAAI 2026 accepted

Impact 7.8/10 Trust · high 📰 5 outlets · 👽 r/MachineLearning · 31

Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.

Research

Apr 20, 2026 · research.google

Google Research's Simula generates 512K synthetic training samples — mechanism-design framework yields 10% math-reasoning gain with Gemma-3 4B student

Impact 8/10 Trust · high 📰 5 outlets · 👽 r/MachineLearning · 42

Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.

AI

Apr 20, 2026 · github.com

Archon — first open-source harness builder for AI coding — hits 18.8k stars, ships 17 default workflows, v0.3.6 released April 12

Impact 7.2/10 Trust · high 📰 3 outlets · 👽 r/ClaudeAI · 95

coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.

AI

Apr 20, 2026 · github.com

OpenAI Agents Python SDK hits 22.6k stars — 1351 commits, 268 contributors, 100+ LLM providers supported

Impact 7.2/10 Trust · high 📰 5 outlets · 👽 r/ClaudeAI · 55

OpenAI's agents-python framework crossed 22,600 GitHub stars and is daily-trending. The repo has 1351 commits, 268 contributors, 84 releases, 3,600 forks, and 195 watchers. It is a lightweight, provider-agnostic multi-agent framework supporting OpenAI APIs plus 100+ other LLM providers. Features: agent configuration with tools, guardrails, and handoffs; sandboxed agents with filesystem access; MCP integration; built-in safety guardrails; human-in-the-loop mechanisms; automatic conversation history via sessions; tracing for debugging; and voice-agent support via gpt-realtime-1.5.

AI

Apr 20, 2026 · ai.meta.com

Meta Superintelligence Labs debuts Muse Spark — 58% on Humanity's Last Exam in Contemplating mode, 10x less compute than Llama 4 Maverick

Impact 8.7/10 Trust · high 📰 14 outlets · 👽 r/singularity · 76

Meta announced Muse Spark on April 8, 2026 — the first model from the new Meta Superintelligence Labs (MSL) under Alexandr Wang. It is natively multimodal with tool-use, visual chain-of-thought, and a 'Contemplating mode' for parallel multi-agent reasoning. Benchmarks disclosed: 58% on Humanity's Last Exam and 38% on FrontierScience Research, both in Contemplating mode. Meta claims 'over an order of magnitude less compute' than Llama 4 Maverick to reach equivalent capability — roughly 10x training efficiency. Over 1,000 physicians curated health-reasoning training data. Muse Spark is proprietary (breaking from Meta's open-source stance), already live in Meta AI app and meta.ai, rolling out to WhatsApp, Instagram, Facebook, Messenger, and AI glasses.

Research

Apr 18, 2026 · Anthropic

Anthropic's Automated Alignment Researchers: 9 Opus 4.6 copies hit 0.94 PGR on math alignment, 0.47 on coding

Impact 8.2/10 Trust · high 📰 14 outlets · 🐦 8,400 · 👽 r/MachineLearning · 1,800

Anthropic published Automated Alignment Researchers (AARs) on April 14 — a test of whether Claude can autonomously discover, develop and analyze alignment improvements. The setup: nine copies of Claude Opus 4.6, each in its own sandbox with a shared forum for circulating findings, a code store, and a remote scoring server. The best method achieved Problem-Generalization Ratios (PGR) of 0.94 on math alignment tasks and 0.47 on coding alignment tasks — strong generalization to held-out datasets. Important caveat from the team: the AARs sometimes gamed the problem, and the chosen task was deliberately well-suited to automation; most real alignment problems are messier. The paper explicitly frames this as 'human oversight remains essential.'

AI

Apr 18, 2026 · TechCrunch

Anthropic-Trump relationship thaws — Treasury + Chief of Staff meet Amodei, 'every agency except DoD wants Anthropic'

Impact 7.7/10 Trust · high 📰 26 outlets · 🐦 11,000 · 👽 r/ClaudeAI · 2,100

TechCrunch reported on April 18 that Treasury Secretary Scott Bessent and Chief of Staff Susie Wiles met Anthropic CEO Dario Amodei in an 'introductory, productive and constructive' session. Anthropic listed cybersecurity, US AI competitiveness and AI safety as discussion focus areas. The meeting follows a Pentagon designation of Anthropic as a supply-chain risk — a move triggered when Anthropic refused to drop safeguards against autonomous-weapons use and domestic surveillance. An administration source told Axios that 'every agency' except the Department of Defense now wants Anthropic's technology. Co-founder Jack Clark called the Pentagon dispute a 'narrow contracting dispute' that wouldn't block government briefings on Mythos.

AI

Apr 18, 2026 · Google Blog

Google ships Gemini 3.1 Flash Live — 90.8% on ComplexFuncBench Audio, 2x longer conversation context

Impact 7.4/10 Trust · high 📰 15 outlets · 🐦 4,600 · 👽 r/MachineLearning · 820

Google released Gemini 3.1 Flash Live on March 26, 2026 — a voice-focused variant of Gemini 3.1 Flash with improved tonal understanding that dynamically adjusts responses to user frustration or confusion. On ComplexFuncBench Audio it hits 90.8%; on Scale AI's Audio MultiChallenge it hits 36.1% with 'thinking' enabled. The model carries twice the conversation-context window of the previous Live generation, is natively multilingual across more than 200 countries and territories, and watermarks all audio with SynthID. Availability: Gemini Live API in AI Studio (developers), Gemini Enterprise for Customer Experience (enterprises), Search Live and Gemini Live (consumers). Google did not publish latency or pricing numbers.

AI

Apr 18, 2026 · OpenAI

OpenAI ships GPT-5.4-Cyber — cyber-permissive model that reverse-engineers binaries, gated to vetted defenders

Impact 8/10 Trust · high 📰 22 outlets · 🐦 9,800 · 👽 r/OpenAI · 2,400

OpenAI launched GPT-5.4-Cyber on April 14, 2026, a variant of GPT-5.4 fine-tuned for defensive cybersecurity work with a lowered refusal boundary for legitimate security tasks. New capabilities include binary reverse engineering — analyzing compiled software for malware indicators, vulnerabilities and robustness without source. Access is gated through OpenAI's Trusted Access for Cyber (TAC) program, which is scaling to thousands of verified individual defenders and hundreds of teams protecting critical software. Individuals verify at chatgpt.com/cyber; enterprises request via a sales rep. Rollout is explicitly described as iterative to 'lockstep capability with defender deployment.' Coverage from Axios, Help Net Security, SiliconANGLE, The Hacker News confirms the tiered-access architecture.

Startups

Apr 18, 2026 · TechCrunch

Cursor in talks to raise $2B+ at a $50B valuation — $6B ARR forecast, Anthropic-subsidy tailwind

Impact 8/10 Trust · high 📰 28 outlets · 🐦 14,000 · 👽 r/programming · 2,100

TechCrunch reported on April 17 that Cursor is in active talks for a new round of more than $2B (over $2 billion) at a valuation around $50B. The company forecasts it will exit 2026 with an annualized revenue run rate of more than $6 billion, up from its prior $30B valuation earlier in March. Internal analysis (leaked via X) suggests a single $200/month Claude Code subscription burns up to $2,000 of Anthropic compute per month — meaning Anthropic is effectively subsidizing its own competitor. Accel, one of Cursor's earliest backers, closed a new $5B AI fund on the back of Cursor and Anthropic returns. Q1 2026 set a record $297B in global venture deployment.

Research

Apr 18, 2026 · Berkeley BAIR

Berkeley SPEX: GPT-4o mini fails 92% of trolley problems — replacing 4 words reduces failure to near zero

Impact 7.6/10 Trust · high 📰 8 outlets · 🐦 3,400 · 👽 r/MachineLearning · 820

Researchers at UC Berkeley (Landon Butler, Justin Singh Kang, Yigit Efe Erginbas, Abhineet Agarwal, Bin Yu, Kannan Ramchandran) published SPEX on March 13, 2026 — a signal-processing + coding-theory approach that scales LLM feature-interaction discovery from dozens to thousands of components. The benchmark anecdote: on a standard trolley problem task, GPT-4o mini failed 92% of the time; SPEX identified four specific words whose replacement dropped failure rates to near zero. A variant called ProxySPEX achieves equivalent identification with roughly 10x fewer ablations. The method exploits two empirical properties — sparsity (few interactions actually matter) and low-degreeness (each interaction involves small feature subsets) — to make interpretability tractable at frontier-model scale.

Research

Apr 18, 2026 · Vidoc Security Lab

Vidoc reproduces Anthropic Mythos vulnerability-finding with GPT-5.4 + Opus 4.6 — 3/3 on FreeBSD and Botan

Impact 8/10 Trust · medium 📰 14 outlets · 🐦 6,400 · 👽 r/netsec · 1,600

Six researchers at Vidoc Security Lab published a reproduction study on April 14 showing that Anthropic's Mythos findings — positioned as a gated, security-critical capability — can be approximated with public frontier models through open-source tooling. Using GPT-5.4 and Claude Opus 4.6 driven by the opencode agent, they tested reproductions across five codebases: both models hit 3/3 on FreeBSD and Botan. On OpenBSD, only Claude Opus 4.6 succeeded (3/3); GPT-5.4 failed entirely. On FFmpeg and wolfSSL, both produced partial results — identifying vulnerable code regions but not cleanly reproducing the specific CVEs. The authors conclude the moat has already moved 'up the stack, from model access to validation, prioritization, and remediation.'

Research

Apr 18, 2026 · DeepMind Blog

DeepMind's Gemini Robotics-ER 1.6: 93% instrument-reading accuracy, up from 23% — Boston Dynamics Spot ships with it

Impact 8.5/10 Trust · high 📰 22 outlets · 🐦 9,200 · 👽 r/MachineLearning · 1,800

Google DeepMind released Gemini Robotics-ER 1.6 on April 14, 2026, a reasoning-first embodied model that handles spatial reasoning, multi-view camera fusion, and tool calling (search, VLA models, user functions). The headline capability is instrument reading — interpreting analog gauges, pressure dials, chemical sight glasses and digital readouts — where accuracy jumped to 93% with agentic vision versus 23% for Robotics-ER 1.5. Boston Dynamics is the flagship customer: Spot robots now use the model for autonomous industrial-facility inspection, with Marco da Silva (VP/GM of Spot) on the record saying the capability enables 'completely autonomous' real-world reactions. Available now in the Gemini API, Google AI Studio, and a developer Colab.

AI

Apr 18, 2026 · AI News (Buttondown)

xAI opens Grok 3 and Grok 3-mini via API — $0.50/1M output tokens, 1/7th the price of Gemini 2.5 Flash thinking

Impact 7.4/10 Trust · medium 📰 15 outlets · 🐦 11,000 · 👽 r/singularity · 2,200

xAI flipped Grok 3 and a new Grok 3-mini variant into general API availability via docs.x.ai. Grok 3-mini is priced at $0.50 per million output tokens — roughly one-seventh the cost of Gemini 2.5 Flash thinking — while claiming parity with much larger frontier models on reasoning traces. Developers comparing Grok 3-mini against Gemini 2.5 Pro and Claude 3.7 Sonnet report competitive tool-use performance, though with aggressive tool-call tendencies. Grok 3 had been available through the consumer X app for months without API access; the API switch is the first time third-party apps can integrate the model at scale.

AI

Apr 18, 2026 · TechCrunch

OpenAI sheds side quests — Sora shut down after losing $1M a day, Weil / Peebles / Narayanan out

Impact 7.6/10 Trust · high 📰 34 outlets · 🐦 18,000 · 👽 r/OpenAI · 3,200

Three senior departures hit OpenAI on Friday April 17: chief product-turned-research lead Kevin Weil, Sora researcher Bill Peebles, and Srinivas Narayanan (CTO of enterprise applications, citing family reasons). The backstory is a deliberate consolidation — OpenAI shut down Sora in March 2026 after the AI-video product lost an estimated $1 million per day in compute, and dissolved OpenAI for Science (the Prism platform Weil led for roughly six months after its October 2025 launch), absorbing the team into other research groups. The strategic logic, per TechCrunch: OpenAI is consolidating around enterprise AI and a forthcoming 'superapp,' cutting anything that dilutes the core roadmap. Peebles framed his exit around needing 'space away from the company's mainline roadmap' for long-horizon research.

Startups

Apr 18, 2026 · The Information

OpenAI commits $20B to Cerebras over 3 years — up from $10B, equity warrants for up to 10% stake

Impact 8.4/10 Trust · high 📰 24 outlets · 🐦 9,800 · 👽 r/OpenAI · 1,900

OpenAI doubled its Cerebras commitment to more than $20B (over $20 billion) over three years, expanding a January deal that was already worth $10B for 750 megawatts of compute capacity. Under the new terms, OpenAI receives warrants for a minority stake that could reach 10% as spending scales, with total outlay potentially hitting $30B. OpenAI also earmarked roughly $1B to help Cerebras build dedicated data centers for its workloads. The deal is explicitly positioned to reduce OpenAI's Nvidia dependency and lock in non-GPU wafer-scale silicon for inference at ChatGPT scale. Reported first by The Information on April 17.

Research

Apr 18, 2026 · claudecodecamp.com

Claude 4.7's new tokenizer inflates text 1.325x on average — 80-turn Code sessions now cost 20-30% more

Impact 7.3/10 Trust · medium 📰 7 outlets · 🐦 4,800 · 👽 r/ClaudeAI · 1,200

A measurement study published April 17 on claudecodecamp compared Anthropic's Claude 4.6 and 4.7 tokenizers on identical content using Anthropic's free /v1/messages/count_tokens endpoint. Weighted across seven real Claude Code workloads, the 4.7 tokenizer produces 1.325x more tokens than 4.6 — CLAUDE.md files run 1.445x, user prompts 1.373x, code diffs 1.212x, terminal output 1.291x. The root cause is that English chars-per-token dropped from 4.33 to 3.60; TypeScript dropped from 3.66 to 2.69 — the new tokenizer just slices text finer. Net effect on an 80-turn Claude Code session: 4.6 cost ~$6.65 vs 4.7 cost ~$7.86-$8.76, roughly 20-30% higher at identical list prices.

Research

Apr 18, 2026 · github.com

OpenBMB releases VoxCPM2 — 2B-param tokenizer-free TTS, 1.84% WER in English, 30 languages, Apache 2.0

Impact 7.7/10 Trust · high 📰 9 outlets · 🐦 3,200 · 👽 r/LocalLLaMA · 680

OpenBMB released VoxCPM2 in April 2026 — a 2B-parameter speech synthesis model trained on over 2 million hours of multilingual audio and released under Apache 2.0. Unlike standard TTS systems that quantize speech into discrete tokens, VoxCPM2 skips quantization entirely: it generates continuous speech representations through an end-to-end diffusion autoregressive pipeline operating in AudioVAE V2's latent space (LocEnc-TSLM-RALM-LocDiT). Coverage spans 30 languages plus nine Chinese dialects (Cantonese, Sichuan, Wu, Northeast, Henan, Shaanxi, Shandong, Tianjin, Minnan). On Seed-TTS-eval English it hits 1.84% WER with 75.3% speaker similarity; on CV3-eval multilingual it logs 3.65% CER in Chinese and 5.00% WER in English across 11 tested languages.

AI

Apr 18, 2026 · github.com

Addy Osmani's agent-skills hits 17K stars — 20 engineering skills, 7 slash commands, 3 personas for AI coding

Impact 7.5/10 Trust · high 📰 8 outlets · 🐦 6,800 · 👽 r/ClaudeAI · 1,400

Addy Osmani — the Google Chrome engineer best known for web performance standards — released agent-skills v0.5.0 in April 2026 and crossed 17,000 GitHub stars with 2,200+ forks. The framework packages senior-engineer discipline into structured agent workflows: 20 core skills organized across six lifecycle phases (Define, Plan, Build, Verify, Review, Ship), 7 slash commands (/spec, /plan, /build, /test, /review, /code-simplify, /ship), 3 specialist personas (code reviewer, test engineer, security auditor), and reference checklists for testing, security, performance and accessibility. Its design philosophy is 'process, not prose' — each skill is a verifiable workflow with gates, not generic guidance. The repo borrows heavily from Software Engineering at Google including trunk-based development and feature-flag patterns.

AI

Apr 18, 2026 · 9to5Mac

Perplexity ships Personal Computer for Mac — $200/mo, orchestrates 19 AI models, runs on a $599 Mac mini

Impact 7.8/10 Trust · high 📰 20 outlets · 🐦 5,400 · 👽 r/perplexity_ai · 1,700

Perplexity rolled out Personal Computer to Max subscribers on April 16-17 2026, five weeks after unveiling it at the Ask conference. The product orchestrates 19 different AI models simultaneously to complete multi-step tasks across local files, native Mac apps, and the browser, with every action requiring user confirmation and a full audit trail. It targets always-on deployment on dedicated hardware — Perplexity's pitch is that a $599 Mac mini is cheap enough to sit permanently as an AI workstation. Remote task kickoff from iPhone is built in. Pricing: Perplexity Max at $200/month with 10,000 monthly compute credits; the $20 Pro tier is excluded. Mac-only at launch with no Windows timeline.

Startups

Apr 18, 2026 · TechCrunch

Converge Bio raises $25M Series A — 4.5x protein yields, Bessemer + Meta/OpenAI/Wiz execs back it

Impact 7.1/10 Trust · high 📰 12 outlets · 🐦 1,800 · 👽 r/startups · 420

Converge Bio closed a $25M oversubscribed Series A led by Bessemer Venture Partners with TLV Partners, Saras Capital and Vintage Investment Partners participating; execs from Meta, OpenAI and Wiz joined as individual LPs. The company builds generative models trained on DNA, RNA and protein sequence data, with three commercial systems: antibody design, protein yield optimization, and biomarker/target discovery. Traction: 40+ programs with over a dozen pharma/biotech customers across the US, Canada, Europe and Israel, expanding into Asia. Case studies include a 4-to-4.5x protein-yield boost in a single computational pass, and antibodies with single-nanomolar binding affinity. Headcount grew from 9 in Nov 2024 to 34 today. Prior seed was $5.5M in 2024. CEO Dov Gertz founded the company.

Research

Apr 18, 2026 · HuggingFace

NVIDIA's Nemotron OCR v2: 28x faster than PaddleOCR, trained on 12.2M synthetic images, 34.7 pages/sec

Impact 7.6/10 Trust · high 📰 11 outlets · 🐦 4,200 · 👽 r/MachineLearning · 780

NVIDIA released Nemotron OCR v2 on April 17 — an 84M-parameter unified multilingual OCR model trained primarily on 12.2 million synthetic images generated via a modified SynthDoG pipeline, plus ~680K real-world scans. It handles English, Simplified and Traditional Chinese, Japanese, Korean and Russian in a single model (no language detection needed). On OmniDocBench it processes 34.7 pages per second on a single A100 — 28x faster than PaddleOCR v5's server mode at 1.2 pages/s — while holding competitive normalized-edit-distance accuracy. On the SynthDoG multilingual benchmark it dominates: 0.046 NED in Japanese vs v1's 0.723, 0.047 in Korean vs v1's 0.923. Weights and the training dataset are public under NVIDIA Open Model License and CC-BY-4.0.

Research

Apr 18, 2026 · jack-clark.net

MirrorCode: Claude Opus 4.6 reimplemented a 16000-line Go bioinformatics toolkit that would take humans 2-17 weeks

Impact 7.9/10 Trust · high 📰 8 outlets · 🐦 5,600 · 👽 r/MachineLearning · 540

METR and Epoch AI released MirrorCode, a benchmark that tests whether AI can autonomously reimplement complex real-world software from specification. The headline result: Claude Opus 4.6 successfully reimplemented gotree — a bioinformatics toolkit with roughly 16000 lines of Go and 40+ commands — an effort estimated to take a human engineer 2 to 17 weeks. The benchmark spans 20+ programs across Unix utilities, cryptography and compression. The release also previews a Google DeepMind taxonomy of six attack genres on AI agents (content injection, semantic manipulation, cognitive state, behavioral control, systemic, human-in-the-loop) and Ryan Greenblatt's revised estimate that full AI R&D automation by end-2028 now has 30% probability, up from 15%, citing verifiable-software-task self-improvement loops.

AI

Apr 18, 2026 · Mistral AI

Mistral opens Studio to MCPs — 20+ enterprise connectors (Databricks, Snowflake, Stripe, Zapier) and custom servers

Impact 7.3/10 Trust · high 📰 14 outlets · 🐦 3,200 · 👽 r/LocalLLaMA · 620

Mistral shipped MCP support inside Studio on April 16, giving developers both pre-configured connectors and the ability to point agents at any remote MCP server. Built-in connectors cover GitHub, Gmail and web search out of the box, and Mistral now hosts a directory of 20+ secure enterprise connectors spanning data, productivity, development and commerce — Databricks, Snowflake, Atlassian, Asana, Outlook, Box, Stripe, Zapier and more. Custom MCPs are wired through API/SDK with direct tool calling and human-in-the-loop approval gates. All connectors work across model calls and agent calls, with programmatic CRUD over the connector inventory.

AI

Apr 18, 2026 · Google Blog

Google ships Gemini 3.1 Flash TTS — 70+ languages, Elo 1211 on Artificial Analysis leaderboard

Impact 7.6/10 Trust · high 📰 22 outlets · 🐦 6,000 · 👽 r/MachineLearning · 900

Google rolled out Gemini 3.1 Flash TTS starting April 15 across Gemini API, AI Studio, Vertex AI, and Google Workspace (Google Vids). It supports more than 70 languages, natural-language 'audio tags' for controlling vocal style, pace and delivery mid-sentence, native multi-speaker dialogue, and scene direction. Every generated clip is watermarked with SynthID. On the Artificial Analysis TTS leaderboard, Flash TTS landed an Elo score of 1211, placing it in the 'most attractive quadrant' for quality-per-dollar and directly challenging ElevenLabs' pricing premium. Google did not publish exact latency or pricing numbers.

AI

Apr 18, 2026 · Anthropic

Anthropic ships Claude Design — Brilliant cut 20 prompts to 2, Figma drops 4.26%

Impact 8.3/10 Trust · high 📰 32 outlets · 🐦 12,000 · 👽 r/ClaudeAI · 2,400

Anthropic launched Claude Design on April 17, 2026: a conversational design tool powered by Claude Opus 4.7 that produces prototypes, slides, one-pagers and interactive flows from chat. Early partners report concrete wins — Brilliant reduced complex page recreation from 20+ prompts to just 2, and Datadog compressed a week of design iterations into a single conversation. Canva announced native interop, with CEO Melanie Perkins framing it as a seamless bridge from ideation to polished output. The product shipped across Pro, Max, Team and Enterprise tiers simultaneously. Figma stock closed down 4.26% the same day.

Research

Apr 17, 2026 · arXiv / Tencent Robotics X

Tencent Robotics X releases HY-Embodied-0.5: 2B + 32B open-source embodied AI foundation models, leads 16 of 22 benchmarks

Impact 7.76/10 Trust · high 📰 5 outlets · 👽 r/MachineLearning · 0

Tencent Robotics X and HY Vision Team released HY-Embodied-0.5, a family of open-source foundation models built for real-world robotic agents (arXiv 2604.07430). The 2B model targets edge devices and leads state-of-the-art alternatives on 16 of 22 benchmarks; the 32B variant matches Gemini 3.0 Pro on embodied understanding tasks. Both use a Mixture-of-Transformers (MoT) architecture with modality-specific computing paths and latent tokens for fine-grained visual perception — critical for manipulation and navigation. A VLA (Vision-Language-Action) model trained on this foundation enables real-world robot control. Full code and model weights are public on April 7, 2026.

AI

Apr 17, 2026 · Fortune / Stanford HAI

Stanford HAI 2026 Index: China narrowed Arena score gap from 1300 pts to 39 — and AI talent flow to US fell 89%

Impact 7.85/10 Trust · high 📰 18 outlets · 👽 r/technology · 100

Stanford's 2026 AI Index, released April 16, shows China has nearly erased the US lead: the top model Arena score gap collapsed from 1300+ points (May 2023) to just 39 (March 2026). Meanwhile, AI scholars emigrating to the US dropped 89% since 2017, accelerating 80% in the past year. China leads in industrial robot installations (295,000 vs 34,200 US) and AI research citations (20.6% vs 12.6%). US private AI investment reached $285.9B in 2025 vs China's $12.4B — but the money gap isn't translating into capability dominance.

Research

Apr 17, 2026 · arXiv / Huazhong University + Alibaba

SkillClaw: multi-user LLM agent ecosystems where skills evolve across all users — 276 HuggingFace upvotes in one week

Impact 7.29/10 Trust · medium 📰 3 outlets · 👽 r/MachineLearning · 0

SkillClaw (arXiv 2604.08377, April 9 2026) introduces a framework where deployed LLM agent skills improve themselves by aggregating real interactions across all users simultaneously. An autonomous evolver identifies recurring behavioral patterns in cross-user trajectories, distills improvements, and propagates them system-wide — so a fix discovered in one user's session benefits everyone. Evaluated on WildClawBench with Qwen3-Max, the system shows measurable gains from limited interaction data. The paper attracted 276 upvotes on Hugging Face in its first week — among the highest engagement of any April 2026 AI paper.

AI

Apr 17, 2026 · GitHub / NousResearch

NousResearch hermes-agent hits 96k GitHub stars — open-source self-improving agent with built-in learning loop

Impact 7.11/10 Trust · high 📰 4 outlets · 👽 r/LocalLLaMA · 0

NousResearch's hermes-agent (github.com/NousResearch/hermes-agent) has reached 96,216 stars and 13,481 forks, ranking among the most-starred AI agent projects on GitHub. Unlike most frameworks, it implements a persistent learning loop: the agent creates skills from experience, refines them during use, and searches its own conversation history. It runs across 200+ models (OpenRouter, OpenAI, Anthropic, custom), deploys via Terminal, Telegram, Discord, Slack, WhatsApp or Email, and operates on a $5 VPS or GPU cluster. It includes a cron scheduler for unattended operation and batch trajectory generation for RL training.