We continue our series about alternatives to transformers.
In the AI of the week, we are going to deep dive into the new HRM small models, you need to know about those.
The opinion of the week we dive into the idea of recursive models as a new AI scaling law.
The last three weeks felt like a phase transition. Not the kind you measure in benchmark deltas — the kind where the substrate underneath frontier AI quietly rearranges itself, and the org charts and cap tables that fall out look nothing like what we had in April.
Google set the tone at I/O. Sundar called it the “agentic Gemini era,” and for once the marketing copy matched what shipped. Gemini Omni landed as the headline — an any-to-any generative model anchored on video, a real step forward in multimodal editing and world understanding — but the more consequential release was Gemini 3.5 Flash paired with Google Antigravity, their agent-first development platform. The pitch flipped from “AI that helps you write” to “agents that help you act.” Stack the new TPU 8i underneath, and Google now owns a vertically integrated agentic pipeline from silicon up through the IDE. It is the most coherent agent story any frontier lab has shipped this year.
Then, on May 19, Andrej Karpathy joined Anthropic. He’ll work on pretraining under Nick Joseph, and the official framing is that he’ll build a team “using Claude to accelerate pretraining research.” Read it carefully. An OpenAI co-founder is now constructing the loop that improves the next Claude using the current Claude. The self-improvement flywheel every lab has been sketching on whiteboards for two years just got the best person in the world to actually compile it. The signal-to-noise on this hire is unusually high — and Anthropic’s broader pattern of pulling CTOs of billion-dollar companies into individual-contributor research seats is the under-noticed sub-story.
That hire makes more sense when you read the compute side of the ledger. Two weeks earlier, on May 6, Anthropic announced full access to xAI’s Colossus 1 — more than 300 MW, roughly 220,000 H100, H200, and GB200 GPUs — to relieve the pressure that had been forcing Pro and Max rate caps. The price tag surfaced a fortnight later in SpaceX’s S-1: $1.25B per month through May 2029, roughly $45B total, with planned expansion to Colossus 2 and a stated interest in orbital compute. The competitor-as-supplier topology is new, and the unit economics of an inference workload now include “what megawatts can I lease from a launch company?”
The public markets were already pricing that shift. On May 14, Cerebras went public on Nasdaq, pricing at $185, opening at $350, and closing day one near a $95B market cap — the largest tech IPO since Uber, with an order book reportedly 20x oversubscribed. The headline is the pop; the interesting line items are inside the S-1. A $24.6B backlog. A multi-year contract with OpenAI worth more than $20B for 750 MW of inference capacity. A binding term sheet from AWS for CS-3 systems. Two UAE-linked customers account for roughly 86% of 2025 revenue, so the concentration risk is real, but the signal from public markets is unambiguous: specialized inference silicon, sold as long-dated capacity, now prices like infrastructure, not like a chip company. Cerebras was the dress rehearsal.
The main event is the IPO triple-stack behind it. SpaceX’s S-1 points toward a $1.75–2T listing. OpenAI is reportedly filing confidentially in days at $850B–$1T. Anthropic is targeting October at roughly $900B. Three of the largest tech IPOs in history, all gated on the same underlying compute substrate, all hitting public markets inside a six-month window. We are about to get a real-time, mark-to-market valuation function for the frontier itself.
The takeaway, for me, is that the frontier is no longer just a research artifact — it is a vertically integrated capital structure. Compute is a tradeable supply contract. Talent is liquid across labs. The moats are no longer “who has the best loss curve” but “who can fund $45B compute leases, recruit the people who can compress those leases, and keep a public-equity story coherent under quarterly scrutiny.”
The next eighteen months won’t be decided on a benchmark. They’ll be decided on a balance sheet.
AI Lab: Sapient Intelligence & MIT
Summary: This paper introduces HRM-Text, an efficient pretraining paradigm that replaces standard Transformers with a dual-timescale Hierarchical Recurrent Model (HRM) to drastically reduce the compute and data required for training large language models. By combining this architecture with a task-completion objective trained exclusively on instruction-response pairs, the model achieves competitive benchmark performance using up to 900x fewer tokens than contemporary foundation models.
AI Lab: Carnegie Mellon University
Summary: This study evaluates the practical effectiveness of AI-generated peer reviews by having 45 domain scientists manually grade 2,960 individual criticisms from 82 Nature-family papers. The findings reveal that while frontier AI models can surface highly significant and well-evidenced critiques, they often lack subfield-specific context and overlap heavily with one another, suggesting they are currently best utilized to augment rather than replace human reviewers.
AI Lab: NVIDIA
Summary: This technical report presents Nemotron-Labs-Diffusion, a language model trained with a joint objective that seamlessly unifies autoregressive (AR), diffusion, and self-speculation decoding capabilities within a single architecture. The authors demonstrate that AR and diffusion training are complementary, allowing the model to utilize self-speculation—where diffusion drafts and AR verifies—to achieve superior throughput and efficiency across various deployment scenarios without relying on multi-token prediction (MTP) methods.
AI Lab: Google & ETH Zürich
Summary: This paper introduces StitchVM, a lightweight model stitching framework that attaches a frozen diffusion backbone to a pretrained pixel-space reward model to efficiently create strong value models for noisy latents. By directly evaluating noisy latents rather than relying on costly Tweedie or Monte Carlo approximations, StitchVM significantly accelerates both inference-time and training-time diffusion alignment methods while maintaining or improving generation quality.
AI Lab: University of Illinois Urbana-Champaign & Meta
Summary: This paper introduces Spreadsheet-RL, an on-policy reinforcement learning framework designed to train specialized AI agents for complex spreadsheet workflows within a realistic Microsoft Excel environment. By combining an automated data collection pipeline with a structured tool harness, the framework significantly improves the ability of open-source models to execute multi-step spreadsheet tasks across general and domain-specific benchmarks.
Google has plenty of AI announcements at it I/O conferences including the amazing Gemini Omni.
Qwen open sourced its latest marquee model.
Hark raises $700M Series A at $6B valuation — Brett Adcock’s stealthy AI lab Hark closed a $700M Series A at a $6B post-money valuation led by Parkway Venture Capital, with Nvidia, AMD Ventures, Qualcomm Ventures, Salesforce Ventures and others, to build multimodal personal-AI models (this summer) and bespoke “universal interface” hardware to follow. →
NanoCo (NanoClaw) raises $12M seed, declines ~$20M buyout — The Cohen brothers’ security-focused, sandboxed OpenClaw alternative NanoClaw closed an oversubscribed $12M seed led by Valley Capital Partners (with Docker, Vercel, monday.com, Slow Ventures, and Hugging Face’s Clem Delangue) six weeks after launching the open-source project that went viral via Karpathy and Singapore’s foreign minister. → No clean primary source — TC has the exclusive interview; original TC link is the canonical version.
Andrej Karpathy joins Anthropic pre-training — Karpathy joined Anthropic to start a team using Claude to accelerate pre-training research under Nick Joseph, returning to frontier LLM R&D after Tesla, OpenAI, and Eureka Labs.
Anthropic acquires Stainless — Anthropic acquired Stainless, the SDK-generation startup whose tooling has powered every official Anthropic SDK and is also used by OpenAI, Google, Replicate, Runway and Cloudflare; Anthropic will wind down all hosted Stainless products, taking the SDK tooling exclusive (deal terms undisclosed; The Information had pegged the price north of $300M).
Ocean emerges from stealth with $28M for agentic email security — Israeli founders Shay Shwartz and Oran Moyal launched Ocean out of stealth with $28M total funding led by Lightspeed (with Picture, Cerca, and angels Assaf Rappaport, Yevgeny Dibrov, Nadir Izrael), positioning its multi-agent email investigation platform against AI-generated phishing.
Manus weighs $1B raise to unwind Meta takeover — Manus’s three Chinese co-founders (Xiao Hong, Ji Yichao, Zhang Tao) are exploring raising ~$1B from external investors at a valuation matching the >$2B Meta paid, possibly with personal capital, to comply with Beijing’s order to unwind the December deal — with a Chinese JV structure and Hong Kong IPO as the likely next steps.
SpaceX files publicly for Nasdaq IPO under SPCX — SpaceX publicly filed its S-1 on May 20 for what would be the largest IPO ever (~$75B target raise at ~$1.75T valuation), listing on Nasdaq as SPCX, with a super-voting structure to keep Musk in control despite multi-billion-dollar losses and a $41.3B accumulated deficit.
OpenAI preparing IPO filing in days or weeks — Per WSJ (Bloomberg confirming via its own source), OpenAI is working with Goldman Sachs and Morgan Stanley on a confidential S-1 draft, potentially filed as soon as this Friday and targeting a September public debut at a valuation north of $850B — moving fast after the Musk lawsuit was dismissed on statute-of-limitations grounds. → No clean primary source — both the WSJ original and Bloomberg’s confirmation rely on anonymous sources. Bloomberg link stands; the WSJ original is paywalled and not easily linkable.
Exa raises $250M Series C at $2.2B — Exa Labs raised a $250M Series C led by Andreessen Horowitz at a $2.2B valuation (more than triple its $700M valuation from last fall) to scale its agent-optimized web search API, train its next-gen retrieval models, and handle hundreds of thousands of searches per second across its 500B+ URL index.
