All Episodes

What It Do: 6 Parameters That Separate Cinematic AI From Total Slop

with Jason Katz · Kindling Solutions

June 1, 202600:34:10Littleton, CO

What It Do: 6 Parameters That Separate Cinematic AI From Total Slop

0:000:00

Show Notes

Everyone thinks the magic is in the prompt. It is not. The magic is in everything you build around the prompt.

That is the thread running through this build-in-public session, where the conversation goes deep on what it actually takes to make AI video that looks like a real person, sounds like a real person, and does not collapse into that plastic, uncanny mush we have all learned to scroll past. The answer is not a better sentence typed into a box. It is a system. Jason Katz, founder of Kindling Solutions in Littleton, Colorado, spent nine months in trial and error to arrive at the exact chain of models, references, and approvals that turns a single character into a living, transforming short film.

Along the way you get the unglamorous truths nobody puts in a launch video: the order you stitch voice and visuals in matters more than the tools you use, your characters have to be locked before they are useful, and the cheapest thing in the entire pipeline is the thing that holds it all together. A polished founder-led explainer came out of this workflow for around $300 in roughly a week. A traditional agency had quoted tens of thousands.

There is also a quieter story underneath the tooling — about why a founder builds anything as a system in the first place, the pull between shipping at scale and being present in your own life, and the strange new normal where your face and your voice can be reproduced from a phone full of selfies. The recommended posture throughout is transparency: synthetic clips for reach and recognition, human presence for trust.

Frameworks from This Episode

The Six Parameter Prompt Protocol

A scene prompt is not a wish. It is an assembled spec built from six fixed inputs. Missing any one of them is why the output looks generic.

  • Subject: who or what is in the frame.
  • Action: what is happening — specific, not vague.
  • Camera: the move or shot type, anchored to a real reference example rather than a descriptor.
  • Look and style: the visual treatment and filter.
  • Lighting and color: mood and palette, tied to brand where relevant.
  • Things not to do: the negative space that keeps the model from inventing what you did not want.

Anchor Frames (The Glue Method)

Still images are cheap and fast. Video generation is expensive and slow. Anchor first, animate second — and the video model bridges between your stills instead of inventing new backgrounds.

  • Generate high-quality still images for the start, optional middle, and end of every scene using GPT Image 2.
  • Feed those stills to the video model as anchors so it has fixed endpoints to connect.
  • Reuse anchors as a tiled storyboard canvas before a single second of video is rendered.
  • Images cost cents. Final video generation costs dollars and takes time. The economics favor anchoring heavily.

The Locked Cast System

A character that regenerates differently in every scene is useless. Lock the likeness once, reuse it everywhere — like a video game avatar that belongs to your brand.

  • Train a likeness in Higgsfield Soul with 20 to 30 varied, well-lit photos.
  • Once trained, the character is locked and reusable across scenes without drifting.
  • Use Soul HEX to bake brand colors in — feed it a photoreal reference in your brand hues, not a flat swatch.
  • A towel, a table, a shadow in your palette gives the model real-world color context to match.

The Voice Sequence Rule

Build video first and layer voice after, and the lips fight the audio. Reverse the order and the problem disappears.

  • Generate the voice in ElevenLabs professional voice first, before any final video render.
  • Layer voice into the pipeline before the video generation step, not after.
  • Render video last so the lip movement tracks the audio rather than fighting it.
  • This single sequencing change eliminates the most common uncanny valley artifact in AI video.

Systematize or It Does Not Ship

A one-off production is a hobby. A system is a business. Build every capability so it can be taught, handed off, and eventually sold as its own offer.

  • Build every capability as a repeatable system, not a one-off with undocumented steps.
  • Future-proof each system so it can become its own product or service offering later.
  • Bring in help only once the system is teachable and the revenue justifies the hire.
  • Running four to five parallel Claude Code instances coordinating ten to twenty agents is only possible with documented systems underneath.

Subsidize the Show

Use an autonomous synthetic pipeline for reach and recognition. Keep the human core of your brand clearly human. Be transparent that AI is involved — the goal is scale, not deception.

  • Build an autonomous pipeline to produce a daily 60-second tech rundown in your own voice and likeness.
  • Let synthetic clips feed discovery while the authentic human moments stay human.
  • Transparency is the posture: synthetic for reach, real for trust.
  • Your face and voice becoming recognizable at scale is the point — as long as you are honest about the mechanism.

Key Terms

Claude Code: Anthropic's agentic coding tool, used here to orchestrate parallel work across multiple projects and connect to other services via MCP.
MCP (Model Context Protocol): The connection layer that lets a tool like Claude drive another application directly — for example, controlling Higgsfield from a Claude Code session.
Higgsfield Soul / Soul ID: A Higgsfield feature that trains a consistent, reusable character likeness from a set of 20 to 30 photos. Once trained, the character can be used across scenes without regenerating a new face each time.
Soul HEX: The Higgsfield feature for locking brand colors into AI video generations. Feed it a photoreal reference in your brand hues rather than a flat hex code for best results.
GPT Image 2: OpenAI's image model (branded Images 2.0), used in this workflow for likeness-preserving anchor frames that give the video model fixed visual endpoints to bridge between.
Nano Banana: Informal name for Google's Gemini image model, referenced as a strong alternative to GPT Image 2 for generating anchor frames.
Seedance: ByteDance's video model, referenced in this workflow for building motion reference clips before committing to final video generation.
ElevenLabs professional voice: The higher-tier ElevenLabs voice cloning option that produces a clone close enough to the original to be hard to distinguish in short-form content.
Anchor frame: A fixed still image used to lock the visual start, middle, or end of a scene. The video model bridges between anchor frames rather than inventing its own transitions.
Context engineering: The practice of structuring memory, history, and references so an AI system behaves consistently across sessions — the reason the same prompt can produce radically different results without it.
AI slop: Low-effort, generic AI output that signals low quality to the viewer. The opposite of what the six-parameter system is designed to produce.
ICP: Ideal Customer Profile — the specific person the product or content is built to serve.

Tools from This Episode

Kindling Solutions

Custom AI systems and agent teams for founder-led businesses — one spark becomes your system and infinite scale.

Higgsfield

AI video platform with character training (Soul ID) and brand color locking (Soul HEX) for consistent cinematic generation.

ElevenLabs

Professional voice cloning platform used to generate a voice-first audio track before the video render.

Descript

Audio and video editing tool used to stitch the final pipeline output together.

Blazel

Referenced in the workflow as a production tool.

Q&A

Who is Jason Katz?

Jason Katz is the founder of Kindling Solutions, based in Littleton, Colorado. He builds custom AI systems and agent teams for founder-led businesses, with a specialty in AI video production pipelines developed over nine months of hands-on trial and error.

Why does prompting alone fail to produce good AI video?

Because a single prompt cannot hold continuity, voice, or style across multiple shots. Quality comes from a workflow that strings several models together with locked characters, anchor frames, and a specific voice-first sequencing — not from one clever sentence.

What are the six parameters every scene prompt needs?

Subject (who or what is in frame), action (what is happening), camera (move or shot type tied to a real reference), look and style (visual treatment), lighting and color (mood and palette), and things not to do (the negative space that keeps the model honest).

How do you keep an AI character consistent across scenes?

Train a locked likeness in Higgsfield Soul with 20 to 30 varied, well-lit photos. Then anchor every scene with still images of that character so the video model preserves identity instead of regenerating a new face. Use Soul HEX with a photoreal color reference rather than a flat swatch to lock brand colors.

How do you sync a cloned voice to AI video without the lips looking wrong?

Sequence the voice into the pipeline first and render the video last. Generating voice in ElevenLabs professional voice first, then feeding it to the video model, means the lip movement tracks the audio rather than fighting it.

How much does a founder-led AI explainer video actually cost?

In this workflow, a polished explainer was produced for around $300 in roughly a week. A traditional agency had quoted tens of thousands for equivalent output.

Should AI-generated video content try to hide that it is AI?

No. The recommended posture is transparency — use synthetic clips for reach and recognition while keeping the authentic human core of your brand clearly human. The goal is scale and familiarity, not deception.

How many AI coding instances can one founder realistically run in parallel?

In this workflow, four to five parallel Claude Code instances coordinating ten to twenty agents across separate projects. Possible only with documented, repeatable systems underneath.

Links from This Episode