Saturday, 9 May 2026
I'm Teaching AI to Edit Video Like Me (Here's How)
I've spent ten years making videos. Corporate, documentary, farm content, brand films. Thousands of editing decisions across 162 timelines in DaVinci Resolve.
Every one of those decisions — where to cut, how long to hold, when to go wide vs tight, where to drop b-roll — that's a pattern. My pattern. And I wanted to know: can AI learn it?
The problem nobody talks about
Most AI video tools focus on generation. Text-to-video. Image-to-video. They want to make something from nothing.
But that's not my problem. I have too much footage, not too little.
I run a farm with cameras everywhere. Signal groups full of daily content. 20,916 videos and 109,595 photos across four drives. 20 terabytes. 395 hours.
The bottleneck isn't shooting — it's editing. And specifically, it's the time between "footage exists" and "reel is ready to post."
What I built
Here's the stack, running entirely on a Mac Mini with 64GB RAM:
A local AI video pipeline:
- SQLite clip index — every video catalogued with ffprobe metadata (duration, resolution, codec)
- Qwen 2.5-VL 7B — local vision model that tags what's in each clip. Farm, food, people, landscape, tech. Runs on Apple Silicon, costs nothing
- Nomic Embed v1.5 — semantic embeddings so I can search clips by meaning, not filename
- MLX Whisper large-v3 — transcribes any talking-head content locally
- Remotion — programmatic video composition in React. A-roll, b-roll, pacing — all in code
- DaVinci Resolve MCP — Claude Code talks directly to Resolve via the scripting API
- New footage lands (Signal group, AirDrop, local folder)
- Vision model tags every clip automatically
- System clusters related clips by topic — one topic = one reel
- Remotion assembles a draft with AI-matched b-roll, pacing, and cuts
- Draft renders and lands in Google Drive + Telegram notification
- I polish in Resolve if needed
- System diffs my corrections and learns
The style training experiment
Here's the part that gets interesting. I parsed my entire Resolve project — 10 years of work, 7,925 clips, 162 timelines.
What came out was something I'm calling Edit DNA:
- Median shot length: 3.67 seconds. I cut fast.
- Source usage: 11.4%. I use tiny fractions of each clip. The rest gets thrown away.
- 6.2% of my clips use zoom. 7.1% use pans. I'm restrained with camera movement in the edit.
- Brand content median: 2.32 seconds per shot. Even faster than my overall average.
I then ran a first batch of AI edits. Created 4 timelines, reviewed them side by side with my own work, and captured every correction as a structured rule:
- No portrait clips in landscape timelines
- Minimum 1.7 seconds per clip — anything shorter feels like a glitch
- Fewer clips is better. My AI was over-cutting
- Narrative flow matters — don't randomise the order
- Footage over 60fps should be slowed down. Mute the slowed audio
- Establishing shot first. Always
The narrate-train loop
This is the part I'm most excited about. I screen-record myself editing in Resolve with my mic on, talking through every decision out loud.
"I'm cutting here because the energy drops." "This clip goes first because it establishes the location." "I'm removing this one — it's a duplicate angle of what we already have."
The recording gets transcribed via Whisper, then an LLM extracts structured rules: trigger, decision, reasoning, confidence. Those rules feed directly into the AI's matching prompts.
It's a feedback loop. AI drafts → I correct → corrections become rules → next draft is better.
What this means for content creation
Right now, footage sits on drives for weeks before it becomes a post. The farm team shoots incredible stuff every day, but turning raw clips into a polished 30-second reel takes 30-60 minutes of skilled editing time. Time I don't have when I'm also running five other projects.
With this system, the goal is: footage lands, AI drafts a reel within minutes, I review and approve or tweak. The AI handles the assembly; I handle the taste.
That's the real promise here. Not replacing the editor — giving the editor leverage.
What's next
The system works end-to-end for montage-style reels. The quality gap is closing with every correction cycle. But there's a lot still to build:
- RAID media organization — 130,000 files across multiple drives need the same tag-and-cluster treatment
- Resolve MCP feedback loop — automated diffing between AI draft and Billy's polished version
- Audio and colour — no music, no grading yet. These are the next quality multipliers
- Product packaging — I'm building toward a standalone digital product called "Edit DNA" that extracts your editing style from Resolve and automates your pipeline
Stay human,
Billy
Want more like this? Every Monday I send a short letter about building with AI — real projects, real plumbing, real results.
Get the Monday letter →