Saturday, 9 May 2026

I'm Teaching AI to Edit Video Like Me (Here's How)

aivideoautomationclaude-code

I've spent ten years making videos. Corporate, documentary, farm content, brand films. Thousands of editing decisions across 162 timelines in DaVinci Resolve.

Every one of those decisions — where to cut, how long to hold, when to go wide vs tight, where to drop b-roll — that's a pattern. My pattern. And I wanted to know: can AI learn it?

The problem nobody talks about

Most AI video tools focus on generation. Text-to-video. Image-to-video. They want to make something from nothing.

But that's not my problem. I have too much footage, not too little.

I run a farm with cameras everywhere. Signal groups full of daily content. 20,916 videos and 109,595 photos across four drives. 20 terabytes. 395 hours.

The bottleneck isn't shooting — it's editing. And specifically, it's the time between "footage exists" and "reel is ready to post."

What I built

Here's the stack, running entirely on a Mac Mini with 64GB RAM:

A local AI video pipeline:

SQLite clip index — every video catalogued with ffprobe metadata (duration, resolution, codec)
Qwen 2.5-VL 7B — local vision model that tags what's in each clip. Farm, food, people, landscape, tech. Runs on Apple Silicon, costs nothing
Nomic Embed v1.5 — semantic embeddings so I can search clips by meaning, not filename
MLX Whisper large-v3 — transcribes any talking-head content locally
Remotion — programmatic video composition in React. A-roll, b-roll, pacing — all in code
DaVinci Resolve MCP — Claude Code talks directly to Resolve via the scripting API

The workflow:

New footage lands (Signal group, AirDrop, local folder)
Vision model tags every clip automatically
System clusters related clips by topic — one topic = one reel
Remotion assembles a draft with AI-matched b-roll, pacing, and cuts
Draft renders and lands in Google Drive + Telegram notification
I polish in Resolve if needed
System diffs my corrections and learns

The style training experiment

Here's the part that gets interesting. I parsed my entire Resolve project — 10 years of work, 7,925 clips, 162 timelines.

What came out was something I'm calling Edit DNA:

Median shot length: 3.67 seconds. I cut fast.
Source usage: 11.4%. I use tiny fractions of each clip. The rest gets thrown away.
6.2% of my clips use zoom. 7.1% use pans. I'm restrained with camera movement in the edit.
Brand content median: 2.32 seconds per shot. Even faster than my overall average.

These numbers tell the AI how I edit. Not in theory — in practice, extracted from real decisions.

I then ran a first batch of AI edits. Created 4 timelines, reviewed them side by side with my own work, and captured every correction as a structured rule:

No portrait clips in landscape timelines
Minimum 1.7 seconds per clip — anything shorter feels like a glitch
Fewer clips is better. My AI was over-cutting
Narrative flow matters — don't randomise the order
Footage over 60fps should be slowed down. Mute the slowed audio
Establishing shot first. Always

Each correction becomes a rule that feeds back into the next generation.

The narrate-train loop

This is the part I'm most excited about. I screen-record myself editing in Resolve with my mic on, talking through every decision out loud.

"I'm cutting here because the energy drops." "This clip goes first because it establishes the location." "I'm removing this one — it's a duplicate angle of what we already have."

The recording gets transcribed via Whisper, then an LLM extracts structured rules: trigger, decision, reasoning, confidence. Those rules feed directly into the AI's matching prompts.

It's a feedback loop. AI drafts → I correct → corrections become rules → next draft is better.

What this means for content creation

Right now, footage sits on drives for weeks before it becomes a post. The farm team shoots incredible stuff every day, but turning raw clips into a polished 30-second reel takes 30-60 minutes of skilled editing time. Time I don't have when I'm also running five other projects.

With this system, the goal is: footage lands, AI drafts a reel within minutes, I review and approve or tweak. The AI handles the assembly; I handle the taste.

That's the real promise here. Not replacing the editor — giving the editor leverage.

What's next

The system works end-to-end for montage-style reels. The quality gap is closing with every correction cycle. But there's a lot still to build:

RAID media organization — 130,000 files across multiple drives need the same tag-and-cluster treatment
Resolve MCP feedback loop — automated diffing between AI draft and Billy's polished version
Audio and colour — no music, no grading yet. These are the next quality multipliers
Product packaging — I'm building toward a standalone digital product called "Edit DNA" that extracts your editing style from Resolve and automates your pipeline

If you're a creator sitting on drives full of footage that never gets posted — this is the problem I'm solving. Not with another AI video generator that makes fake content from text prompts. With a system that takes YOUR footage, learns YOUR style, and does YOUR editing. Faster.

Stay human,

Billy

Want more like this? Every Monday I send a short letter about building with AI — real projects, real plumbing, real results.

Get the Monday letter →