I Built an AI Workforce That Runs While I Walk, Talk and Sleep

Everyone’s running Claude Code in Cursor. I was too. I launched two products this year, RideReady and fizz, and nearly lost my mind vibe coding them into existence. You can only say “fix it, it doesn’t work” so many times before you start imagining a world where these things run while you sleep. So I built an AI workforce manager where I describe the outcome, deploy swarms overnight, and only get pulled back in when they’re stuck.

It has a self-improving loop. When it breaks, it learns why, and next time it breaks less. This is what it looks like when you stop being the cognitive bottleneck and start being the swarm operator.

The bottleneck has moved

Every generation of tooling promises to make building faster. And every generation reveals that speed was never really the constraint.

When GitHub Copilot arrived, developers celebrated writing code faster. When Cursor and Claude Code followed, non-developers like me could suddenly ship products from scratch. But here’s what nobody warned us about. When execution gets cheap, the bottleneck shifts upstream. It moves from “can you build it?” to “can you describe what’s worth building, and why, with enough clarity that something else can go and do it?”

I noticed this halfway through building fizz. My best results never came from the grind of fixing, tweaking, and hand-holding every AI decision. They came when I stopped supervising and started thinking. When I stepped back and got ruthlessly clear on what I actually wanted. The bottleneck was never my typing speed. It was my thinking. The industry has a name for this shift. From human-in-the-loop, where you supervise every step, to human-on-the-loop, where you set the intent and only get pulled in when it matters. But the way it felt to me was simpler than that. I let go of the keyboard.

128 hours in the loop

I’m not a developer. I cut my teeth in marketing, ran a boutique data analytics consultancy, then co-founded AskRally before starting a venture studio. I’ve never attached my identity to writing code. That turned out to matter.

RideReady started as a personal itch. I’m not someone who cleans their bike after every ride, and I kept missing maintenance windows until something expensive failed. I wanted a system that would just tell me when a component needed attention. It hooks into Strava, tracks wear across every part on your bike, and sends you the alert before something costs you. Watch a demo here. fizz is stranger. Connect Strava, complete an activity, and it generates a dancing AI video of you in the right sportswear, in the right scene. Hike in Spain, and you’re hiking in Spain.

RideReady and Fizz product screens

fizz took eight days of sixteen-hour sessions. Most of the core product came together in a couple of days, but then I spent the rest trying to hack-proof it. AI video generation is expensive, and I didn’t want someone exploiting the system and draining my account.

The loop looked like this. Prompt the LLM, run the code, find the bug, prompt again.

The vibe coding loop

I had the MVP clear in my head almost immediately. But I never documented it. I was too busy making things bug-proof and secure to think about anything beyond the next fix. Go-to-market, strategy, the next idea, none of it got touched.

That’s when I stopped asking “how do I code faster?” and started asking “what if I wasn’t in the editor at all?”

From vibe coder to swarm operator

Because I’m not a developer, I had no ego keeping me at the keyboard. If handing the whole process to agents worked, I’d happily do it. There were more valuable things I could spend my time on.

That’s the mindset behind Flow, my R&D experiment in what I’m calling outcome-first orchestration. The concept is simple. Instead of prompting an LLM line by line, I describe an outcome. Build this app, decode that content formula, research this market. I shape a vision. What are we trying to do, what does success look like, what opinions do I have on how the work should be done. Then I deploy autonomous workers and let them run.

Flow outcome interface with intent and success criteria

An outcome is a structured brief. Intent, success criteria, context sections. I can talk into it, literally ramble about wanting the app optimised for virality, or wanting agents to build a custom scraper before tackling the main task. Flow detects skill and tool dependencies automatically. It runs a two-phased implementation. First it deploys workers to build the required capabilities (scrapers, analysers, whatever the main task needs) and then it deploys workers to execute on the actual outcome.

Think of it as taking a generic group of agents and turning them into a specialised unit aimed at one specific goal. The vision, not the code, is what shapes the quality of the output.

The retro loop

The system breaks constantly. The first few days of running projects through Flow, I was getting pulled in left, right and centre. Agents hit ambiguity, got stuck in loops, or just failed. Every time, the observability layer (I call it HOMЯ) would escalate and surface a decision. Add more context, split it.

HOMЯ escalation with decision options

You can only take so much of that before you add a YOLO mode. So I did. YOLO mode puts another AI in the human-in-the-loop seat, letting it make decisions that would normally require me. This is the shift from human-in-the-loop to human-on-the-loop in practice, not as a theoretical framework, but as something I built because I needed to sleep.

The real breakthrough, though, is what happens after. The U.S. Army developed a practice in the 1970s called the After Action Review, a structured debrief where soldiers analyse what was supposed to happen, what actually happened, and what to do differently next time. Peter Senge called it “one of the most successful organisational learning methods yet devised.” I built the AI equivalent.

After every completed outcome, I run an automated retrospective. It analyses every escalation, every moment HOMЯ pulled me in, every failure point, and surfaces ideas for how the system can improve itself. Some features it’s suggested are things I’d never have come up with, or at least not without spending hours combing through error logs for patterns. I approve the ones I like, and Flow creates a new outcome focused on implementing them.

Retro output with suggested improvements

There’s a flywheel here. The more work I push through, the more it learns, the less it escalates, the less it needs me. I’ve committed to getting kicked in the teeth as much as possible over the next few months, feeding it every project I can, to see how fast the flywheel spins.

The retro flywheel

Beyond code

This isn’t just for building apps. I pointed the swarm at a content creator called Etymology Nerd and told it to download his recent YouTube content, extract transcripts, and analyse his format.

It built the scraper. It built the transcript extractor. It created skills for how to use those tools. Then it ran the analysis. The output was surprisingly good. No, it was great. And I’ve since remixed it for my own purpose to produce these daily videos. His core formula broken down into hook patterns, body structures, transition techniques, rhetorical devices. The kind of teardown that would have taken me days, distilled into something I could paste into a Claude project and say “create video transcripts in this style.”

Etymology Nerd content analysis output

Now every Sunday I generate a batch of transcripts, add them to my Google Calendar, and each morning I read from one and publish a video. His formula produces one video a week because of the research involved. I could do ten a day.

The walk that built a bike mechanic

Flow also works through the CLI and Telegram, which solved a problem that’s nagged me for years. I have my best ideas on walks. By the time I get home, make a note, and remember to act on it, the momentum is gone. My notes are messy. I forget to read them. The idea dies quietly in a list.

Now I talk into Telegram mid-walk. There’s a Claude agent between me and Flow that has a skill on how to pilot the whole system. It knows the CLI commands, how to structure outcomes, how to prep and deploy runs. I ramble at it, it prepares the project, and hits run. By the time I’m back, there’s something to look at.

A few days ago I tested this with something personal. I was out walking and threw it a task. I’m building a custom titanium bike. I’d sourced all the components but had no idea how to assemble them. Flow deployed agents to research everything a bike mechanic would need to know, then built an AI agent with that expertise. The agent interviewed me about every part I’d collected, how they should interact, what tolerances matter, what order to build in. Then it produced a component-by-component build guide for my exact setup.

Bike mechanic component guide

That whole chain, research, agent creation, interview, personalised guide, kicked off from a voice message on a walk. That’s when the thesis became real to me. The bottleneck isn’t building. It’s knowing what to describe and caring enough about the outcome to describe it well.

Let them run

Here’s the part that’s hard to say out loud.

Every one of these experiments required me to let go of something I was attached to. The vibe coding grind felt productive, even when it wasn’t. Being in the editor, watching the code scroll by, fixing things with my own prompts, that felt like work. Walking away from the keyboard to go on a walk felt like skiving.

But the discomfort is the point. We’ve spent years building identities around the work-about-the-work, the debugging, the prompt-wrangling, the late nights in Cursor. Letting an AI dissolve a cognitive bottleneck you’ve attached your ego to is genuinely uncomfortable. It means admitting that your value isn’t in the execution. It’s in the vision, the taste, the messy human judgment about what’s worth building in the first place.

The U.S. Army learned this when they moved from top-down performance critiques to After Action Reviews. The breakthrough wasn’t better soldiers; it was leaders learning to let the team surface its own lessons. The breakthrough wasn’t control. It was trust.

I think the same thing is happening now with AI swarms. The builders who thrive won’t be the ones who code fastest. They’ll be the ones who can hold a clear vision, describe it with precision, and then do the hardest thing of all. Walk away while the busy work happens.