You asked for proof I've built systems that run themselves — not AI that waits for a human to trigger each step. So instead of telling you, I had my system do it.
Walkthrough of the live run log — heartbeats, model routing, and the three self-corrections that happened with no human in the loop.
A workflow chain is a deterministic graph a human kicks off — last year's "AI as a copilot," good for a 20–30% productivity bump. A real autonomous system observes, decides, acts, evaluates its own output, repairs itself, and keeps going on its own clock. Your two videos draw exactly that line: an agent that codes its own missing tools into existence, and a company loop where the human is removed and a monitor fixes failures before you're back at your desk. That's the side I build on.
| What your videos describe | What I've actually built |
|---|---|
| Self-correction while humans sleep | This run + a Knowledge-Optimisation pipeline ran unattended overnight; one cycle went ~15 hours with zero supervision. |
| Agent identifies a gap → writes the tool → persists it | My environment self-authors reusable Claude skills (400+) that become permanent assets — ~50 new ones written in a single overnight cycle. |
| Tri-part memory (factual / behavioural / procedural) | A blended memory system: cross-machine vector store (900+ live entries), a behavioural layer loaded every turn, and the skills library. |
| Monitoring "agent on top" / heartbeat | A dispatcher on a 20-min heartbeat loop keeps long jobs on track and spawns sub-agents as needed. |
| Recursive knowledge synthesis | KO pipeline built a queryable knowledge graph — 2,115 edges, 167 cross-course links across 24 courses, in one run. |
| Multi-model routing for cost/fit | Routes across Claude, local Qwen/Gemma and Deepseek/Kimi by task. This run failed over from a wedged 16GB box to a 48GB box across a Tailscale mesh mid-run. |
| Quality gate + human-in-loop on high-risk | An eval step accepts / auto-fixes / flags output; irreversible actions (send, deploy, delete) are hard-gated to approval. |
The first local model box was unreachable → fell back automatically to another route.
Local inference hung; the system re-routed to a 48GB machine across the Tailscale mesh and carried on.
A cheaper model leaked foreign context into a result; the dispatcher detected and stripped it before use.
My production autonomy runs in my harness today — not yet inside a client's repo with merge-request-and-deploy loops like your first video's self-editing agent. That's an integration job, not a capability gap: same loop, pointed at your codebase with the deploy gates you'd want. I'd rather flag that up front than have you find it on the call.
I'll screen-share the live run log and the knowledge-graph output, and we can talk about pointing the same loop at your stack.
Let's jump on the call →A quick read of both, plus the non-obvious bits worth knowing and a few talking points for the call.
TL;DR. An internal "ops agent" built on a coding-CLI harness that writes its own missing tools. When it hits something it can't do, it invokes a coding sub-agent, builds the tool, and that tool persists into every future session (a 45+ CLI toolset grown over time).
Factual (read-only DB/codebase snapshots via cron) · Behavioural (an instructions.md loaded every turn) · Procedural (its self-authored CLIs).
Non-technical staff give natural-language feedback; the agent rewrites its own instructions.md — no devs in the loop.
Schedules its own recurring jobs (e.g. uptime checks) unprompted, and reads the production codebase to learn billing logic.
TL;DR. Move from a "Roman Legion" org (humans as information conduits) to recursive, self-improving loops. Last year's AI was a copilot (a 20–30% bump, human triggers everything); the real shift is removing the human — a monitoring agent spots a failure and ships the fix overnight, so the system's already better by the time you're back.
Sensor → Policy → Tool → Quality Gate → Learning. A clean, reusable shape for any autonomous system.
Every email, Slack message and recorded meeting captured as data → a queryable "company brain."
Agents triage customer input, decide what to build, write code, open merge requests and deploy — overnight.
Internal tools and dashboards are treated as disposable — generated for a single task and thrown away, then regenerated when models improve. That's a real mindset shift from "build and maintain forever." Worth adopting in your own language.
Video 2's core claim is structural, not technical: hierarchies exist because humans move information slowly. Once an AI layer holds the whole company's context, the chart collapses. That reframes "AI adoption" as an operating-model question — squarely your consulting territory.
Both videos gloss the genuinely hard part: deciding what the agent may do unsupervised vs. what needs sign-off, and logging it. That's a governance/trust problem, not a coding one — and it's exactly where an experienced consultant earns their fee.
"Sensor / Policy / Tool / Quality Gate / Learning" and "autonomous loop vs. copilot" are their framing. Using it back signals you've done the homework and speak the same language.
Ask where on the autonomy dial he wants to sit. The "AI CPO deploying overnight" end is aggressive — most agencies want a human gate on deploys. Pin down his risk appetite early; it scopes the whole build.
Probe the Sensor layer. What's already captured as data (Slack, email, meeting recordings, telemetry)? That dictates how fast you can stand up the "company brain."
Position the gap as the engagement. Wiring the same loops you run into his repo with proper deploy gates is the actual paid work — frame your existing systems as the proven pattern, his stack as the integration.
A goal was set, then the system ran it hands-off. Every heartbeat, model-routing decision and self-correction below happened without a human triggering the next step.
Read the email thread → pull both video transcripts → understand them → write up how they map to systems already built → produce the deliverables. Dispatcher + heartbeat loop, models routed by cost and fit.
| Job | Routed to | Why |
|---|---|---|
| Transcript extraction | Gemini (non-Claude) | Free, fast — keep the expensive model for reasoning. |
| Framework mapping & analysis | Claude Opus | The one job worth paying for. |
| Coordination / sub-agents | Claude Opus (dispatcher) | Chief-of-staff role. |
| Local open-source inference | Qwen on the machine mesh | Used as failover (see heartbeat 3). |
Ran start to finish hands-off: 3 heartbeats, 3 self-corrections, work routed across three different models by cost and fit. The run produced its own proof.