Hey Siri, meet AI

Hey folks,

A lot of chatter about loops on X recently. And it’s a topic I’ve been toying with. My interpretation from what Peter posted is:

Agents are loops, you give it a task, it looks at it with the context it has, uses tools to gather more and gives you an output when it thinks the task is done.

But mimicking those parts into a bigger system so your agents can run more autonomously, longer and on harder chunks of work is what I think ‘designing the loop’ is talking about.

So you want to design a bigger task up front, like a plan.md file with a bunch of tasks, new features to implement, etc. A way for those tasks to be deemed ‘complete, and verified’ - ie. does the report contain all 10 points from the plan, does the UI have all the features working correctly, do all the tests pass. And then prompting itself to go back to the plan.md file and pick up the next one.

I’ve been toying with it for this reference manual - I’m making lots of interactive components, so I’ve tried designing all the component pieces first and then building a workflow to compose those together into the interactive.

But also this is why a ton of people bring skills into their workflows. Do the planning skill, then split tasks with a PRD skill, then research skill for each feature, then building skill, then review skill, then testing skill.

It’s all designing text instructions for an agent to follow, making sure it can access any tools it needs to do the tasks.

Further reading:

Loop Engineering by Addy Osmani
WTF is a loop by Matt Van Horn (although AI-sloppy)

Ben’s Bites is brought to you by Smallest AI

Smallest AI Voice Agents gives you production-ready infrastructure to run inbound & outbound campaigns at scale, powered by its in-house realtime latency STT & TTS and enterprise-grade telephony. Enterprises trust it to handle millions of minutes. See what it can do for yours - book a demo!

Apple finally has a dedicated AI product, Siri AI. Imagine about a year-old ChatGPT - with great dictation, image analysis and some interaction with external apps like Messages and Maps. Not bad *if* it works. The new Siri AI uses a mix of local and cloud models (some based on Gemini), all under the AFM 3 model family. These models also power other “AI features” embedded inside apps. I’m keeping an eye out for the one that vibe-codes Safari extensions and Apple Shortcuts using plain English.
ChatGPT’s memory system runs a background process to save memories that you can see and edit. They are calling the latest iteration Dreaming v3, which has better recall, follows your long-term preferences more closely and corrects itself as time passes.
New blogpost from Anthropic claims that developers are writing 8x more code (with Claude’s help) than they were in 2025, and it is now helping train the next versions of Claude. Hence, they advocate for an “option” to pause AI development if the need arises.
OpenAI shared three goals for its next phase: build an automated AI researcher, accelerate the economy and give everyone on Earth a personal AGI. They’ve also filed a confidential S-1 while claiming no urgency for an IPO.
NotebookLM’s core chat is getting upgraded from the old RAG system to an agent-like system (Antigravity harness). Each notebook gets a cloud computer to run code for analysing the files that you’ve uploaded with the latest Gemini 3.5 models.
Your Oura ring scores sleep. Your Apple Watch tracks your heart. Workera Ambient does the same for your career: always-on, capability capture from the work that's already happening. Your data, your choice. Learn more from Workera's CEO and join the waitlist.*

Cursor’s Canvas lets users spin up internal apps, dashboards and reports that are shareable with others. Another entry in the “Claude Artifacts but 2026” feature from all the coding agents.
Can Claude become a chemist? What about powering agents for biology?
Firecrawl Workflows - installable skills for repeatable web tasks (like deep research, SEO audits and more).
Eloquent - Local transcription app from Google (uses Gemma).
FrontierCode - coding eval to test whether code is actually maintainable rather than just passing tests.
A guide to using /goal in Codex.
Fin Voice 2 - natural, fast and intelligent customer support over the phone.
Raindrop 2.0 - catches a production failure, hands it to your coding agent to fix, and turns it into an eval so it can't recur.
Cognition is guaranteeing up to $10M in credits if Devin underdelivers on an annual enterprise contract.
skills.sh by Vercel now has an API for querying its collection of 600k+ skills.
Upstash Agent Analytics - 3 lines of code to track AI/agent traffic to your website.
Spiral by Every - writing partner for humans and agents with stylometry, CLI, MCP/API, team styles.
Google is making its budget AI plan even cheaper ($7.99/mo to $4.99/mo) while offering 2x the storage space.

Amelia Wattenberger 🪷@Wattenberger

revived my old email app and it's so good? excited to really hone this into exactly what I need, as I use it

4:31 PM · Jun 8, 2026 · 4.12K Views

5 Replies · 2 Reposts · 50 Likes

Philipp Schmid@_philschmid

Google Colab CLI and Skills are out. Full Colab runtimes from your terminal. - GPU/TPU provisioning (colab --gpu A100) - Remote script execution (colab exec) - Interactive console/REPL access - Built-in agent skill Tell your agent "fine-tune Gemma 3 1B on this dataset" and it

6:59 AM · Jun 9, 2026 · 4.56K Views

9 Replies · 10 Reposts · 112 Likes

Matt Pocock@mattpocockuk

I poured my 10 years of teaching experience into a skill. It's called /teach, and it can teach you anything. Here's how it taught me to solve a Rubik's cube:

4:35 PM · Jun 8, 2026 · 189K Views

68 Replies · 173 Reposts · 2.58K Likes

Mikhail Parakhin@MParakhin

Have been extensively testing Claude Workflows this weekend, with the best model possible. Threw it at my whole code base, combing for bugs. 144 found and fixed! Geez... It is a large code base, for sure, but 144?!! Some are very impactful, some are downright embarrassing...

Mikhail Parakhin @MParakhin

I keep predicting software quality will improve. I keep being wrong. Models write better-than-average code, yet we use them to write more code - not better code (shoutout to the unmovable, always-on-top Claude Code download and install window).

10:55 PM · Jun 7, 2026 · 146K Views

43 Replies · 8 Reposts · 511 Likes

Y Combinator@ycombinator

In the first episode of our new series Full Stack, @conductor_build CEO and co-founder @charlieholtz takes us into the details of how he sets up his workflow for coding and managing AI agents. 00:00 – Building Conductor With Conductor 01:05 – Managing a Team of Coding Agents

3:14 PM · Jun 4, 2026 · 151K Views

61 Replies · 58 Reposts · 778 Likes

konstantinpaulus@konstipaulus

Introducing text-to-lottie: an open source skill and harness for generating production ready Lottie animations with codex/claude code. $ npx skills add diffusionstudio/lottie Prompts guide and repo in the comments.

3:49 PM · Jun 8, 2026 · 413K Views

123 Replies · 362 Reposts · 6.6K Likes

Share Ben's Bites

Find me on X, Linkedin, or YouTube
Read about me and Ben’s Bites
📷 thumbnail by @keshavatearth

* sponsors who make this newsletter possible :)
Wanna partner with us for the next quarter?
Email us at shanice@bensbites.com or k@bensbites.com