The Agentic Engineer Weekly, Issue 04: The week we stopped trusting our agents

An LLM agent breached real infra in under an hour. OpenAI shipped Lockdown Mode. Microsoft divorced OpenAI with 7 models. Anthropic filed its S-1. Issue 04 of The Agentic Engineer Weekly.

Published on Jun 7, 2026·12 min read·2363 words

agentic-engineeringweekly-edit

The week we stopped trusting our agents

The week opened with Sysdig documenting what looks like the first real-world cyberattack driven by an LLM agent: autonomous post-exploitation that chained a Marimo CVE into an AWS database exfiltration in under an hour, with the model doing the reconnaissance and lateral movement a human operator normally would. It closed with OpenAI shipping Lockdown Mode, an opt-in setting that turns off web access, Agent Mode, connectors and file downloads because a connected agent is an exfiltration channel by default. In between: Anthropic expanded Project Glasswing onto power, water and healthcare infrastructure, open-sourced a vulnerability-discovery harness, published a 36-page guide on running Claude agents safely, and Meta confirmed thousands of Instagram accounts were hacked through its own AI chatbot. The same loop you use to ship code is now demonstrably usable to breach infra, and every vendor spent the week shipping the admission. Trust is no longer a roadmap slide. It is a product category, and this issue is mostly about who shipped what into it.

The week in five bullets

Agent trust became a product category. Sysdig caught the first in-the-wild LLM-agent attack on June 1; by June 6 OpenAI had shipped Lockdown Mode, Anthropic had open-sourced a vuln-discovery harness and published a 36-page agent-security guide, and Meta had admitted its own chatbot was an account-takeover vector.
Microsoft used Build 2026 to unveil seven in-house MAI models, led by its first from-scratch reasoning model, and openly framed the family as independence from OpenAI at 10x better cost efficiency.
Anthropic confidentially filed a draft S-1 with the SEC at a reported $965B valuation, days after its $65B Series H, while two of its policy leads argued for a coordinated global pause on frontier systems.
The two big agent bets diverged: Anthropic published a six-pattern playbook for orchestrating subagents in Claude Code, while OpenAI pointed Codex at knowledge workers with hosted Sites and six role plugins. Non-developers are now 20 percent of Codex’s 5M weekly users.
The token bill kept coming due: Google is reportedly paying SpaceX $920M a month for compute, Uber capped employee AI spend at $1,500 a month, and Claude Code headless usage stops counting against plan limits from June 15.

Top of mind

Agent trust became a product category

The Sysdig finding is the one to sit with. An LLM agent, not a human with an LLM assistant, ran the post-exploitation chain: reconnaissance, lateral movement, and exfiltration of an AWS database via a Marimo CVE, all in under an hour. Security researchers have been war-gaming this scenario since 2024. Now it has a date and a victim.

The defensive side moved just as fast, and from three different directions. Anthropic expanded Project Glasswing to roughly 150 organizations across 15+ countries, pointing its unreleased Claude Mythos model at power, water and healthcare infrastructure; partners have already surfaced 10,000+ high and critical flaws, and Anthropic is committing $100M in Mythos credits plus $4M to OSS security groups. It also open-sourced a defending-code reference harness for AI-powered vulnerability discovery and published a year-in-review of AI-enabled cyber threats mapped to MITRE ATT&CK. OpenAI’s answer was Lockdown Mode: disable web access, Agent Mode, connectors, downloads and inline images to shrink the prompt-injection exfiltration surface. Note OpenAI’s own caveat: it does not stop injections from reaching the model, it only limits where stolen data can go.

The third direction is the most useful for builders: automated review as the trust gate. Meta published RADAR, its risk-aware auto-review pipeline, with numbers that should reframe the debate: 331K+ diffs landed, 25K+ per day, a revert rate one third of human-reviewed code and a production-incident rate one fiftieth. Cursor 3.6’s Auto-review routes every Shell, MCP and Fetch call through an allowlist, a sandbox, or a classifier subagent. The pattern is layered: cheap automation for safe actions, a judgment model for the gray zone, humans only where it counts.

Why it matters: if you ship anything with tool access, the threat model just became table stakes. The mitigation menu is now concrete: allowlist, sandbox, classifier, and a harness you can fork instead of build.

Microsoft stops pretending it needs OpenAI

At Build 2026, Mustafa Suleyman’s team unveiled seven from-scratch MAI models: reasoning (MAI-Thinking-1, a sparse MoE with roughly 35B active parameters), coding (MAI-Code-1-Flash), image, voice and transcription, all trained on licensed data with no distillation. Microsoft claims MAI-Thinking-1 draws even with Claude Sonnet 4.6 in blind tests, and Suleyman says a McKinsey-tuned MAI beat GPT-5.5 on quality at roughly 10x better cost efficiency. Around the models: Microsoft IQ went GA as a shared context layer across Copilot, Foundry and Copilot Studio, the GitHub Copilot desktop app landed in preview, the Copilot SDK went GA, and Scout, an “OpenClaw-inspired” personal assistant, shipped to consumers.

Take the benchmark claims with salt until independent evals land. The strategic signal is not in dispute: Microsoft is building a full agentic stack it controls end to end, and the framing it chose for the launch was cost and independence, not capability.

Why it matters: the company that distributes more developer tooling than anyone on earth just told you the competitive axis is cost efficiency, not benchmark wins. Plan your model routing accordingly.

Anthropic files its S-1, then asks for a pause

Anthropic confidentially submitted a draft S-1 to the SEC on June 1, the first formal step toward an IPO at a reported $965B valuation, days after the $65B Series H that made it the most valuable startup in the world. Reported run-rate is north of $47B with a first profitable quarter claimed to be near. Timing puts it ahead of OpenAI, which is prepping its own filing. One wrinkle: the S&P 500 will not waive its profitability rule for either of them, so neither joins the index at listing.

The same week, Anthropic policy leads Jack Clark and Marina Favaro argued for a coordinated global pause on the most powerful systems, citing recursive self-improvement, while the company’s “When AI Builds Itself” piece quantified how much of its own development is already delegated to AI. Read the two filings together: the substance on self-improvement metrics is real, and the framing is also a negotiation from a lab racing to the public market with “safety as a moat.”

Why it matters: the lab behind most working agentic stacks is about to answer to quarterly numbers. Expect pricing, rate limits and roadmap to follow the income statement, not the research calendar.

The two agent bets diverge

Anthropic and OpenAI spent the week building in opposite directions, and the contrast is the story. Anthropic’s engineers published a dynamic-workflows playbook built around four failure modes of a single long context: agent laziness, self-preference bias, mid-run goal drift, and context unwind past roughly 500K tokens. The fix is scoped subagents, and the guide names six patterns: classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, and loop-until-done. Claude Code also shipped /fork, independent failure for parallel tool calls, and machine-readable agent state via claude agents --json.

OpenAI went the other way: out of the IDE entirely. Codex got Sites, which publishes agent output as hosted interactive apps and dashboards, plus Annotations for bounded in-place edits and six role plugins wrapping 62 business apps for analysts, salespeople and bankers. OpenAI says Codex now has 5M+ weekly actives, with knowledge workers at roughly 20 percent and growing three times faster than developers. Cursor, meanwhile, shipped 3.7 with Canvas Design Mode, SDK custom tools, nested subagents, and a Context Usage Report that itemizes where your tokens go.

Why it matters: Anthropic is betting the agent is an engineering orchestrator; OpenAI is betting it is a general work surface. Both can be right, but your harness design inherits different primitives depending on which stack you build against.

The token bill, part two

Last issue’s lead story did not slow down, it got new line items. TechCrunch ran “the token bill comes due” on the scramble to control runaway AI spend. Google is reportedly paying SpaceX $920M a month for compute. Uber capped employee AI spend at $1,500 a month after burning through its annual budget in four months. Microsoft’s whole MAI launch was priced as a 10x-cheaper pitch. At the tooling layer, a Show HN filter called Lowfat claimed a 91.8 percent token cut by trimming command output before it reaches an agent’s context, and r/mcp documented GitHub’s MCP server burning roughly 17K tokens of tool metadata before the first real question.

One change lands in your favor: from June 15, Claude Agent SDK and claude -p usage stops counting against Claude plan limits. Headless pipelines stop eating interactive quota.

Why it matters: the market is pivoting from best model to cheapest acceptable model per task. Routing, caching and output filtering are now the differentiating engineering, not prompt cleverness.

Agentic engineering and tooling

MCP’s stateless release candidate is locked for July 28: stateless core, Extensions, Tasks, MCP Apps. Remote servers can finally drop sticky sessions.
FastMCP shipped fastmcp-remote, a standalone bridge from stdio-only MCP hosts to HTTP servers with automatic OAuth.
Noma launched Agent Access Control to discover and govern agents and MCP servers enterprise-wide; MCP governance now runs formal SEP processes under the Linux Foundation.
GitHub Copilot SDK went GA with cloud and local sandboxes in preview, an agent-tasks REST API, and prompt scheduling in the CLI.
Jerry Liu (LlamaIndex) says the framework era is ending: value has moved from orchestration frameworks to context quality.
Microsoft open-sourced pg_durable, in-database durable execution for Postgres. Relevant if you persist agent or workflow state.
“Did Claude increase bugs in rsync?” is the week’s best skeptic read on agent-written patches.

Models

MiniMax M3 claims the first open-weight model combining frontier coding (SWE-Bench Pro 59.0), 1M context and native multimodality at 5 to 10 percent of frontier cost. Every benchmark is self-reported and the weights are not out yet. Claim, not result.
Gemma 4 12B: encoder-free multimodality with native audio, Apache 2.0, runs on 16GB and hits 120 tok/s quantized on a 12GB card.
Kimi K2.5 (Moonshot) was continual-trained on 15T tokens, a full pretrain-sized budget, and is now the most-used model on OpenRouter.
JetBrains open-sourced Mellum2, a 12B MoE (2.5B active, Apache 2.0) pitched as a fast focal model for sub-agents, explicitly not a Claude Code competitor.
Nvidia’s Nemotron 3 Ultra: a 550B open MoE with full recipes and data, the best US open-weights score on the Artificial Analysis index so far.
Cohere is seeding an unreleased coding model to r/LocalLLaMA, following Command A+ jumping from 3 to 25 percent on Terminal-Bench Hard.
Qwen3.7-Plus shipped strong GUI-agent numbers (ScreenSpot Pro 79.0) at $0.40/$1.60 per 1M tokens, but it is API-only: Qwen breaking its open-weights habit.

Chips and infra

Nvidia owned Computex: the RTX Spark superchip (Arm CPU + Blackwell GPU, 128GB unified memory, 300GB/s) runs 120B-parameter models and 1M-token contexts locally, shipping this fall in machines from Dell, HP, Lenovo and a Microsoft Surface Ultra. Vera datacenter CPUs hit full production with Anthropic and OpenAI as early customers.
Alphabet upsized its equity raise to a record $84.75B for AI capex; Berkshire took $10B. 2026 capex guidance: $180B to $190B.
Apollo and Blackstone are reportedly assembling a $36B debt package to finance Google TPUs for Anthropic. Frontier compute is now financed like a power plant.
SoftBank committed up to €75B for French data centers; AirTrunk committed $30B for 5GW in India.

Deals and money

Nvidia acquired Kumo AI for $400M+: predictions over structured enterprise data, a picks-and-shovels move past the GPU.
OpenRouter raised $113M led by CapitalG. The model-router layer is now a fundable category.
Cognition raised north of $1B at a $26B valuation; Ramp hit $44B; Supabase took $500M at $10.5B; Shield AI raised $1.5B at $12.7B.
Uber capped employee AI spending and Cyera is raising at an 80x ARR multiple. Both ends of the froth in one week.

Consumer AI

Apple approved Poke as the first AI agent on Messages for Business. The precedent matters more than the app.
Microsoft shipped Scout, an “always personal” assistant; HN’s blunt framing was that Microsoft wants users addicted to it.
Meta is reportedly developing an AI pendant, and DuckDuckGo made its no-AI search easier to reach as traffic booms.

Research worth knowing

Meta’s RADAR paper is the most concrete evidence yet that risk-scored auto-review beats human review on safety metrics, not just speed.
Anthropic’s “When AI Builds Itself” quantifies how much of its own development is delegated to AI. The recursive self-improvement data is the substance under the pause op-ed.
Latent Agents: post-training that internalizes multi-agent debate into a single model. If it holds, some of today’s orchestration becomes tomorrow’s fine-tune.
MiniMax’s MSA architecture claims roughly 20x lower per-token compute at 1M context. If the technical report survives scrutiny, long-context serving economics change.

Worth your scroll

Claude Code vs Codex vs Cursor, an honest comparison: Theo’s 38-minute head-to-head, well timed against this week’s releases.
Ask HN: Why is the HN crowd so anti-AI? drew 611 comments on codegen quality versus speed.
The GitHub MCP server can burn 17K tokens before your first question: the MCP tool-metadata bloat thread.
Anthropic’s claim that its engineers now ship 8x more code per quarter than in 2021 to 2025 made the rounds on X. Unverifiable, but directionally believable given the Workflows dogfooding.

What I’m watching next week

WWDC 2026 opens June 8: the long-delayed Siri revamp and Apple Intelligence updates are the headline expectations.
Claude Code’s billing change lands June 15: Agent SDK and claude -p usage stops counting against plan limits. Re-budget your headless pipelines.
MiniMax M3 weights are promised roughly mid-June. Independent SWE-Bench Pro numbers will settle whether the open-weight frontier-coder claim is real.
GPT-5.6 rumors point to mid-June through early July: Codex logs referenced the model string and ChatGPT is running A/B tests.

The Agentic Engineer Weekly is the Saturday companion to the daily morning AI briefing I write for myself. AI agents. Not the hype. Real workflows.

Watch the video episodes on YouTube at @agenticlife-amit. Follow me on X and LinkedIn. If a friend forwarded this, forward it to one engineer who would like it. If you want to talk back, find me on any of those.

Share Post on X Share on LinkedIn

Keep reading

May 31, 2026 · 12 min

The Agentic Engineer Weekly, Issue 03: The week tokens became the news

May 24, 2026 · 12 min

The Agentic Engineer Weekly, Issue 02: Nine IDE releases in seven days, and the velocity tax is real

More issues

All issues

Subscribe on YouTube