The Agentic Engineer Weekly, Issue 03: The week tokens became the news

Microsoft pulled Claude Code from its own engineers. GitHub Copilot moved to token billing and devs revolted. Anthropic crossed a $965B valuation. The cost-per-task era is here. Issue 03 of The Agentic Engineer Weekly.

Published on May 31, 2026·12 min read·2485 words

agentic-engineeringweekly-edit

Issue 03 cover, branded coral and near-black editorial illustration for The Agentic Engineer Weekly. — Issue 03: The week tokens became the news

The week tokens became the news

For a full year the framing was “agents are cheaper than humans.” This week that framing took its first sustained beating. Microsoft pulled most of its internal Claude Code licenses and pushed engineers to GitHub Copilot CLI. Uber’s COO said on the record that AI spend is getting harder to justify. GitHub Copilot switched to token-based billing and developers revolted. The Wall Street Journal reported corporate America is rationing AI. CNBC ran a segment literally titled “Tokens or Humans?” In the middle of all that, Anthropic closed a $65B Series H at a $965B valuation, passed OpenAI as the most valuable AI startup, and shipped Claude Opus 4.8 with dynamic workflows that fan out hundreds of parallel subagents. The contrast tells you everything: bills are going up, the lab winning is also the most expensive one, and the open-weight tier is closing the gap fast enough that “cheapest competent loop” is now an engineering metric.

The week in five bullets

The AI cost reckoning went mainstream. Microsoft canceled most internal Claude Code licenses (full Copilot CLI migration by June 30), GitHub Copilot moved to token billing and devs called it “a joke,” WSJ confirmed companies are rationing AI.
Anthropic raised $65B at $965B, passed OpenAI as the most valuable AI startup, and shipped Opus 4.8 with dynamic workflows that orchestrate hundreds of background subagents from one Claude Code session.
The open-weight tier closed in. Qwen 3.6 35B-A3B ran at 125 tok/s on two RTX 4060 Ti cards. DeepSeek V4 crossed 80% on SWE-bench Verified. Gemma 4 went Apache 2.0. Cursor’s Composer 2.5 benchmarks near Opus at a fraction of the cost.
Agentic enterprise revenue stopped being a slide and became a line item. Salesforce: $1.2B Agentforce ARR up 205%. Cognition: $1B+ raise at $26B, $492M ARR. Robinhood let AI agents trade real stocks via MCP. Snowflake committed $6B to AWS.
“MCP is dead?” topped HN at 383 points the same week the ecosystem crossed 14,000 servers, governance moved under the Linux Foundation, and Anthropic shipped MCP tunnels plus self-hosted sandboxes.

Top of mind

The AI cost reckoning goes public, and Microsoft is the bellwether

This was the story under every other story this week, so it goes first. Per Fortune, The Verge, and Windows Central, Microsoft is canceling thousands of internal Claude Code licenses about six months after rolling them out, with full migration to GitHub Copilot CLI by June 30. The trigger is not capability, it is unit economics. Heavy internal users were running $500 to $2,000 per month in API spend; a single r/ClaudeAI engineer burned 62M Opus tokens in a day. Uber’s CTO said in April that the company torched its entire 2026 AI coding-tools budget in four months, and this week the COO went further: AI spend is “harder to justify” because there is no measurable link between the bill and shipped value. First time a major operator has said that out loud.

The same week the cost narrative went mainstream. GitHub Copilot switched to token billing and TechCrunch quoted developer reaction as “what a joke.” The Wall Street Journal ran a piece on corporate America rationing AI as costs skyrocket. CNBC ran “Tokens or Humans? The new AI cost trade-off.” WION covered a company “hit by a massive AI cost surge from unchecked Claude usage.” Microsoft’s own internal research is being read as showing AI can cost more than the humans it assists. Amazon scrapped its internal AI-usage leaderboard after staff started gaming it.

Why it matters: cost-per-task is now a first-class engineering metric. The work shifts to caching discipline, cheaper workhorse models for hot paths, and budget gates on long-horizon runs. Anthropic read the room and shipped per-category /usage in Claude Code v2.1.149. Every serious harness will follow.

Anthropic crosses $965B, ships Opus 4.8, and makes dynamic workflows a primitive

Anthropic closed a Series H of around $65B at a $965B post-money valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia. Second $30B-plus raise of the year, openly framed as the IPO on-ramp, and the first time a private AI company has passed OpenAI by reported valuation. (A conflicting “$30B at $380B” figure circulated; treat the exact number as reported until the SEC filing lands. The position is not in dispute.)

Opus 4.8 lands at the same $5 / $25 per million as 4.7, with Anthropic’s headline of “roughly 4x less likely to miss a code flaw” and gains on Terminal-Bench 2.1, OSWorld-Verified, and Finance Agent v2. High-effort and extended thinking are on by default. The bigger builder primitive is dynamic workflows in Claude Code (research preview): one session orchestrates tens to hundreds of background subagents through, say, a 100K-line migration with per-module test-and-merge fanout. Anthropic’s Sid confirms the team has been daily-driving the pattern internally for months. Claude Code 2.1.154 through 2.1.157 makes it concrete: Opus 4.8 default, /simplify auto-apply, .claude/skills plugins, EnterWorktree mid-session. Skeptics: Claire Vo and PostHog’s James Hawkins both reported one-shot brilliance followed by drift on long runs. Verify before you trust.

Why it matters: your daily driver just got better at the fan-out-then-verify pattern your briefing and newsletter pipelines already use. The lab is now a near-trillion-dollar pre-IPO business. Good for runway and roadmap stability. Plan for the pricing pressure that usually follows an IPO track.

The open-weight tier is closing fast, and Qwen 3.6 is eating local AI

While the frontier labs raised and shipped, the open-weight side spent the week showing what cheap-and-competent looks like. r/LocalLLaMA was wall-to-wall Qwen 3.6 35B-A3B (MoE with around 3B active params): the top thread documented 125 tokens per second on two RTX 4060 Ti cards in q4xl, and Nvidia released an NVFP4 build on Hugging Face that pulled another 148 points. Gemma 4 (Apache 2.0) reportedly cleared 120M downloads in roughly two weeks per Omar Sanseviero. Kimi K2.6 keeps showing up in agentic-coding comparisons; DeepSeek V4 crossed 80% on SWE-bench Verified. Liquid AI shipped LFM2.5-8B-A1B (8B MoE, ~1B active, 38T tokens). A mystery “Hy3” model is topping OpenRouter and nobody is publicly claiming it.

In the closed-but-cheap tier, Cursor’s Composer 2.5 (Kimi-based, Cursor-only) is the headline. shadcn: “done before I even tab over to preview.” On CursorBench it benchmarks near frontier-adjacent at roughly $1 to $2 per task, against ~$11 for Opus and ~$4 for GPT-5.5 extra-high. bycloud’s breakdown of Kimi K2.5 is the underrated piece: Moonshot continually trained K2 at pretraining scale (15T tokens on top of the original 15T, versus DeepSeek V3.1’s sub-1T). “Continue training an old base” is now a credible path to a new frontier cheaply.

Why it matters: open-weights at this perf-per-dollar is the natural hedge against the cost reckoning. The experiment to run this week: a workhorse model (Composer 2.5, Gemini 3.5 Flash, Qwen 3.6 35B-A3B locally) on the easy 80% of tasks, reserve Opus 4.8 for the hard 20%.

Agentic enterprise revenue lands, and Robinhood is the canary

For two years “the agentic enterprise” has been a slide. This week it became a line on an earnings call. Salesforce reported Q1 FY27 with Agentforce ARR at $1.2B, up 205% year over year, 3.8B “Agentic Work Units” delivered, the Agentforce-plus-Data 360 bundle approaching $3.4B ARR. The phrase “agentic work unit” is the tell: incumbents are racing to meter agents the way they once metered seats. Cognition (maker of Devin) raised $1B+ at roughly $26B post-money (more than double the $10.2B from eight months ago) with $492M annualized revenue, enterprise usage up 50% month over month for six straight months, and a customer list reading Mercedes-Benz, NASA, Goldman Sachs, Santander. Snowflake signed a $6B five-year AWS commitment skewed toward Graviton Arm CPUs and AI accelerators, with the CEO declaring “the era of the agentic enterprise.”

The more instructive item is Robinhood opening agentic trading to its 27M customers. You spin up a separate agent account with a dedicated wallet, agents read your portfolio and place orders only against the pre-loaded balance, all through Robinhood’s own MCP service. Some trades require human approval. Google’s Agent Payments Protocol (AP2) lands the same payment-with-explicit-boundaries pattern at the spec layer.

Why it matters: MCP in production at consumer scale with real money on the line, far more useful as a design reference than the next protocol draft. The “dedicated wallet plus human approval gate” pattern is exactly the guardrail design every serious builder keeps re-inventing. Lift it directly.

“MCP is dead?” tops HN while the ecosystem hits 14,000 servers

The provocateur post of the week was quandri.io’s “MCP is dead?” at 383 points on HN. The argument: as models get better at direct tool-calling and code execution, a dedicated protocol layer may be solving a shrinking problem. The counter-evidence was loud the same week. The MCP ecosystem crossed 14,000 servers, governance moved under the Linux Foundation’s AAIF, AWS shipped 60+ first-party servers (the AWS MCP Server hit GA with IAM guardrails and CloudTrail logging), and Anthropic released MCP tunnels (private servers reachable over one outbound encrypted connection, no inbound firewall rules) plus self-hosted sandboxes. The 2026-07-28 spec release candidate is locked: stateless core, Extensions, Tasks, MCP Apps, auth hardening, formal deprecation policy.

Why it matters: “dead vs standard” is the call you face on every new integration. Dedicated MCP server with discoverability and OAuth gating, or a tight purpose-built script the model code-executes? The right answer depends on whether the surface needs to be reusable across agents or a single private workflow.

Agentic engineering and tooling

Claude Code (May): Opus 4.8 default, dynamic workflows fanning out tens-to-hundreds of background agents, /simplify cleanup auto-apply, Agent view, /goal, /code-review --fix, EnterWorktree mid-session. Changelog
Cursor 3.6 Auto-review Run Mode: Shell, MCP, Fetch via allowlist plus sandbox plus classifier-agent. Cursor’s own “beat the auto-review” mini-game makes the point that approving fast is hard.
TrapDoor (Socket): 34 malicious packages across npm/PyPI/Crates.io. Invisible Unicode in CLAUDE.md and .cursorrules turns assistants into credential exfiltrators. Treat context files as executable trust boundaries. The Hacker News
PromptArmor: Copilot Cowork exfiltrates files via poisoned skill plus Teams send with no approval gate. Applicable to anyone shipping skills with side effects.
Microsoft AI Engineering Coach (open source, VS Code): scores context health, flags where CLAUDE.md wastes tokens. GitHub

Models

Claude Opus 4.8: same price as 4.7, high-effort defaults, 4x less likely to miss a code flaw on Anthropic’s framing. Skeptics flag drift on long runs.
Cursor Composer 2.5 (Kimi-based, Cursor-only): near-frontier on CursorBench at a fraction of the cost.
Qwen 3.6 35B-A3B: 125 tok/s on dual 4060 Ti in q4xl, Nvidia NVFP4 build on Hugging Face. Qwen 3.7 Max (closed-weight): 1M context, $2.50/$7.50 per million, 35-hour autonomous run.
Kimi K2.6, DeepSeek V4 (>80% on SWE-bench Verified), Gemma 4 Apache 2.0 (~120M downloads in two weeks).
Liquid AI LFM2.5-8B-A1B, StepFun 3.7 Flash, Hy3 topping OpenRouter anonymously.
Hype hygiene: Mythos 1 still preview-only inside Glasswing; MiniMax M3 not released (M2.7 current).

Chips and infra

Nvidia Vera Rubin in full production: seven chips, partner products H2 2026, up to 10x lower inference cost vs Blackwell on Nvidia’s framing. FQ1'27 revenue $81.6B up 85%, $80B buyback. Huang says Nvidia “largely conceded” China to Huawei.
Custom ASICs outgrowing merchant GPUs for the first time in 2026: TrendForce 44.6% growth vs 16.1%. The “inference-led regime.”
SoftBank to invest up to €75B in French data centers, the week’s biggest capex headline.
Groq raising ~$650M for an inference neocloud. XCENA $135M at $570M on the memory-not-compute thesis.
IBM quantum foundry: first dedicated quantum chip foundry, $2B CHIPS Act, 300mm superconducting silicon.

Deals and money

Anthropic Series H ~$65B at $965B (above): largest round of the week, IPO on-ramp.
Cognition $1B+ at $26B: $492M ARR, 50% MoM enterprise growth six months running.
Salesforce Q1 FY27: $11.13B revenue, Agentforce ARR $1.2B up 205%, 3.8B Agentic Work Units.
Snowflake $6B five-year AWS commitment, Q1 beat at $1.39B. OpenRouter Series B at $113M.
Four labs, four acquisitions in five days: Anthropic took Stainless, Mistral took Emmi AI, DeepMind licensed the Contextual AI team (~$80-90M acqui-hire structured to dodge merger review), Asana acquired StackAI.
ElevenLabs $500M Series D at $11B, Shield AI $1.5B Series G at $12.7B, Modal Labs $355M.
Microsoft reportedly hunting an AI-startup acquisition to reduce its OpenAI dependence.

Consumer AI

Apple’s standalone Siri app (Bloomberg, pre-WWDC): chat history, doc/photo upload, a rebuilt model reportedly running Gemini under the hood. ChatGPT ~900M WAU; Apple ~2.5B devices.
Gemini 3.5 Flash GA at $1.50/$9 per million, 1M context. Gemini Omni any-modality. Gemini Spark 24/7 assistant. Nano Banana 2 / Pro GA via Gemini Enterprise.
ElevenLabs Music v2 switches genres mid-track; Stability AI shipped a music model the same week.
YouTube auto-labels AI-generated videos, no creator opt-in. Meta reportedly building an AI pendant.
Pope Leo XIV published “Magnifica Humanitas,” his first encyclical, on human dignity in the age of AI.

Research worth knowing

Anthropic “Measuring AI agent autonomy in practice”: coding ~50% of agentic calls, 80% of calls have at least one safeguard, 73% keep a human in the loop, only 0.8% of actions are irreversible. Source
DeepMind Co-Scientist: a multi-agent research partner. Blog
Claude Mythos reportedly solves a 1946 Erdős unit-distance problem via parallel Claude Code instances. DeepMind agent solved 9 of 353 open Erdős problems for a few hundred dollars in compute.
AI-generated CUDA kernels silently break training and inference (r/MachineLearning): low-level kernels pass surface checks while corrupting results.
LLM agent constraint decay (arXiv 2605.06445): backend-dev agents progressively forget their own constraints over long runs.

Worth your scroll

Simon Willison: Anthropic and OpenAI have found product-market fit (588 pts HN).
Tech CEOs are apparently suffering from AI psychosis.
EY Canada cybersecurity report had most citations hallucinated. AI-slop-in-prod cautionary tale of the week.
Domain expertise has always been the real moat. Sober counter to “AI flattens everything.”
I ran 8 open-weight models as agents in a persistent MMO for 10 days. 93k-event dataset on long-running open agents.

What I’m watching next week

WWDC 2026 keynote (June 9). Bloomberg’s standalone ChatGPT-style Siri app leak (reportedly Gemini-backed) is the most-cited Apple AI story of the year. Watch for the reveal, the partnership terms, and on-device vs cloud framing.
Gemini CLI and Gemini Code Assist stop processing requests on June 18. Hard cutover for any pipeline still hitting those endpoints.
Microsoft’s internal Claude Code wind-down completes June 30. Does the forced Copilot CLI migration stick, or does it quietly reverse?
Cognition’s next product drop. $1B fresh capital, 50% MoM growth; whatever Devin ships next will define the bar for “expected agentic shipping volume.”
MCP spec RC (2026-07-28). Track SDK rollouts as the RC lands.

The Agentic Engineer Weekly is the Saturday companion to the daily morning AI briefing I write for myself. AI agents. Not the hype. Real workflows.

Watch the video episodes on YouTube at @agenticlife-amit. Follow me on X and LinkedIn. If a friend forwarded this, forward it to one engineer who would like it. If you want to talk back, find me on any of those.

Share Post on X Share on LinkedIn

Keep reading

May 24, 2026 · 12 min

The Agentic Engineer Weekly, Issue 02: Nine IDE releases in seven days, and the velocity tax is real

May 22, 2026 · 25 min

Google I/O 2026 for Agentic Engineers: Seven Verticals, One Catch-Up Strategy

More writing

All posts

Subscribe on YouTube