AI.news
主页教程研究工具模型AI创业讨论新闻每日简报WIKI🚀 创业库★ 投稿
AI+医疗机器人教育金融能源健康娱乐思考

The Sequence AI of the Week #871: Inside the Loop with Claude Opus 4.8

I am sure you guys are surprised that we are going to cover Claude Opus 4.8 today ;) but I have been playing with it so much that merits the post.

Opus 4.8 shipped on May 28, 2026. The headline contributions, in the order I’d rank them for anyone building agents: a roughly 4x reduction in how often the model leaves a flaw in its own code unremarked — the calibration/honesty story that defines this release; a fix for silently skipped tool calls, the bug class that quietly poisons long trajectories; better compaction recovery so long-horizon runs stop derailing after the history gets squeezed; dynamic workflows that let the model plan and fan out hundreds of parallel subagents for codebase-scale work; adaptive thinking that decides per-turn whether to reason at all; and a fast mode that runs ~2.5x faster at a tier that’s now ~3x cheaper than 4.7’s. Alignment results land near the (still-restricted) Mythos Preview. Same regular-mode pricing as its predecessor.

It’s tempting to file this under “minor release.” The version bump is a tenth of a point, the benchmark deltas are mostly incremental, and the cadence makes it easy to lose track — Opus 4.6 landed February 5, 4.7 on April 16, and 4.8 just six weeks later. That’s a compression from a roughly quarterly rhythm to something closer to monthly, and when point releases arrive that fast the instinct is to treat each one as a patch and skip the changelog.

That instinct is wrong here, because Opus 4.8 isn’t competing on the axis the version number implies. The benchmark table moved a little. What moved a lot is the reliability axis — the silent-failure rate, the tool discipline, the ability to hold a thread across a long run unattended. Those are the properties that gate whether you can actually leave an agent running, and they don’t show up on a capability leaderboard. The short cadence is also the tell: when you can ship calibration and reliability fixes every six weeks, the model stops being a thing you upgrade quarterly and becomes infrastructure you keep current. So let me give you the version I’d want if I were wiring this into a production agent loop at 2am.