Hacker uses Claude and ChatGPT as assistant-in-the-loop to breach multiple government agencies
·Reddit · r/technology
A threat-actor profile reported on r/technology and escalated across AI-security Twitter this weekend: an individual used Claude and ChatGPT as coding assistants to compose novel exploit chains against at least three US federal agencies. The attacker reportedly fed LLMs the target environment's architecture via open-source filings, had them generate bespoke phishing payloads and post-exploitation scripts, and iterated until bypasses worked. Anthropic and OpenAI have since rotated safety filters; Anthropic disclosed they had downgraded MCP cache TTL on March 6 specifically to shorten the window for adversarial prompt-cache poisoning. Sets the new baseline for "AI-assisted threat actor" reporting.
This is the canonical "AI made this attack trivial" narrative that security-budget conversations have been waiting for. Even if the technical details turn out overstated, the political impact is real — expect federal guidance on LLM usage in sensitive environments within 30 days, and a new wave of enterprise policies banning personal LLM accounts on work devices. Claude and ChatGPT will face pressure to ship tighter abuse detection on code-completion and multi-step planning in the next few releases.
Impact scorecard
7.8/10
Stakes
9.5
Novelty
8.5
Authority
7.0
Coverage
7.5
Concreteness
7.0
Social
8.5
FUD risk
4.0
Coverage15 outlets · 2 tier-1
Reddit r/technology, Hacker News, Wired, Ars Technica, SecurityWeek, Dark Reading
Core claim (LLM-assisted government breach) has multi-outlet coverage and credible security-Twitter amplification — but details are still partly single-sourced and the attacker profile comes from one investigative thread. Wait for Anthropic or OpenAI's official post-mortem before treating specific capability claims as verified. Moderate FUD risk because "AI-assisted hacker" framing is politically charged.
@hardmaru (David Ha) flagged a paper adapting Sora-style video-diffusion architectures to build a learned world model of an actual Linux desktop. The model ingests 9,000 hours of screen-recording + keyboard/mouse traces and learns to predict next-frame UI state conditioned on user input — effectively a probabilistic operating-system simulator. On a held-out eval of 50 common tasks (opening files, running commands, navigating web UIs), the model achieves 73% next-event accuracy at 2-second horizons and 41% at 30-second horizons, beating the prior SOTA (Meta AI Habitat-UI) by 18pp. Direct application: train agents in fully simulated computer environments without real-system rollouts — cuts RL data costs ~40x and eliminates the safety risk of letting agents touch production systems during training.
EE Times deep-dive on AMD's ROCm 7.0 and whether it can finally dent NVIDIA's CUDA moat. AMD's MI400 (96GB HBM4, 5.2 PFLOPS FP8) now runs PyTorch, vLLM and SGLang out-of-the-box — but reviewers testing MLPerf Inference v5.1 still see 1.6–2.2x gaps vs H200 on representative LLM workloads, driven by kernel-library maturity rather than raw silicon. Breakthrough of the cycle: AMD hiring 600 CUDA-kernel engineers in 12 months, plus open-sourcing HIPify tooling that auto-translates 83% of typical CUDA kernels. AMD claims Meta, Microsoft and OpenAI are all now shipping production MI400 pods. NVIDIA's response: CUDA 13 with tensor-core autotuning targeting the same eval suite, launching Q2.
Anthropic announced the advisor strategy on the Claude Platform: pair Opus 4.6 as a planning/critique advisor with Sonnet 4.6 or Haiku 4.5 as the executing model. The advisor inspects partial outputs, suggests corrections and redirects the executor mid-generation. On SWE-bench Multilingual, Sonnet+Opus-advisor scores 2.7 percentage points higher than Sonnet alone, at roughly 1.3x the cost vs 7x the cost of running Opus end-to-end. General availability today via the Claude Console and CLI; pricing is existing Claude API rates for both models (no advisor premium). Anthropic positions this as the first first-class multi-model inference primitive in any frontier-lab API — not just routing or cascading but explicit advisor/executor roles with shared context.