Vidoc reproduces Anthropic Mythos vulnerability-finding with GPT-5.4 + Opus 4.6 — 3/3 on FreeBSD and Botan
·Vidoc Security Lab
Six researchers at Vidoc Security Lab published a reproduction study on April 14 showing that Anthropic's Mythos findings — positioned as a gated, security-critical capability — can be approximated with public frontier models through open-source tooling. Using GPT-5.4 and Claude Opus 4.6 driven by the opencode agent, they tested reproductions across five codebases: both models hit 3/3 on FreeBSD and Botan. On OpenBSD, only Claude Opus 4.6 succeeded (3/3); GPT-5.4 failed entirely. On FFmpeg and wolfSSL, both produced partial results — identifying vulnerable code regions but not cleanly reproducing the specific CVEs. The authors conclude the moat has already moved 'up the stack, from model access to validation, prioritization, and remediation.'
vidocanthropicmythosvulnerabilityopen-source
Why it matters
If public frontier models can already do 60-100% of what Mythos claims, the White House's gated-distribution strategy loses most of its security rationale — defenders should assume attackers already have equivalent offensive reach. The paper also reframes the AI-security debate: frontier-access control isn't the bottleneck, validation and operationalization are. Expect CISO procurement cycles in Q2 to shift budget from 'wait for gated tools' to 'buy the validation stack now,' and expect a follow-on Anthropic publication trying to widen the capability gap with non-public benchmarks.
Impact scorecard
8/10
Stakes
9.0
Novelty
8.5
Authority
8.0
Coverage
6.5
Concreteness
9.0
Social
8.0
FUD risk
3.0
Coverage14 outlets · 2 tier-1
Vidoc Security, Hacker News, The Register, CSO Online, Risky Business
X / Twitter6,400 mentions @vidocsecurity · 2,800 likes
Reddit1,600 upvotes r/netsec
r/netsec, r/MachineLearning, r/singularity
Trust check
medium
Independent security lab, reproducible methodology, specific result counts per target. Published on vidocsecurity blog and discussed on HN front page. Not yet corroborated by a second replication team — treat magnitudes as directional until a third party confirms.
Kronos (AAAI 2026 accepted, arxiv 2508.02739) is the first open-source foundation model pre-trained on financial candlestick (K-line) sequences. A specialized tokenizer quantizes multi-dimensional OHLCV data into hierarchical discrete tokens; a decoder-only autoregressive transformer is pre-trained on 12B (12 billion) K-line records from 45 global exchanges. Results against the leading time-series foundation model (TSFM) and best non-pretrained baseline: 93% higher RankIC on price-series forecasting over TSFM and 87% over the non-pretrained baseline; 9% lower MAE on volatility forecasting; 22% improvement in generative fidelity for synthetic K-line sequences. Model, weights, and demo are open on GitHub (shiyu-coder/Kronos) — repo is currently GitHub-trending.
Google Research published Simula in Transactions on Machine Learning Research (April 16, 2026): a framework that reframes synthetic data generation as mechanism design, using reasoning-driven construction rather than sample-level optimization. The team (Tim R. Davidson, Benoit Seguin, Enrico Bacis, Cesar Ilharco, Hamza Harkous) generated datasets of up to 512K (512,000) data points across five domains — cybersecurity (CTI-MCQ, CTI-RCM), legal reasoning (LEXam), math (GSM8k), and multilingual knowledge (Global MMLU). Results show 'better data scales better': a 10% accuracy gain on math reasoning using Gemini 2.5 Flash as teacher and Gemma-3 4B as student. The four-step recipe is global diversification → local diversification → complexification → quality checks. Complexification helped math but hurt legal reasoning — the paper warns mechanism design is domain-dependent.
coleam00/Archon is a TypeScript open-source workflow harness that makes AI coding deterministic and repeatable through YAML-defined development processes. Hit 18.8k GitHub stars and is trending weekly. Latest release v0.3.6 on April 12, 2026 with 1,265 commits on dev branch. It ships 17 default workflows covering issue fixes, feature development, PR reviews, and refactoring. Core features: isolated execution (each run gets its own git worktree for parallel conflict-free processing), composable workflows (mix deterministic nodes like bash/tests/git with AI-powered steps like planning/code-gen/review), multi-platform (CLI, Web UI, Slack, Telegram, Discord, GitHub webhooks), and human gates (interactive approval steps). MIT licensed, requires Bun + Claude Code + GitHub CLI.