<?xml version="1.0" encoding="utf-8"?>
<oembed>
  <version>1</version>
  <type>rich</type>
  <provider_name>Libsyn</provider_name>
  <provider_url>https://www.libsyn.com</provider_url>
  <height>90</height>
  <width>600</width>
  <title>MLA 027 AI Video End-to-End Workflow</title>
  <description>How to maintain character consistency, style consistency, etc in an AI video. Prosumers can use Google Veo 3’s &amp;quot;High-Quality Chaining&amp;quot; for fast social media content. Indie filmmakers can achieve narrative consistency by combining Midjourney V7 for style, Kling for lip-synced dialogue, and Runway Gen-4 for camera control, while professional studios gain full control with a layered ComfyUI pipeline to output multi-layer EXR files for standard VFX compositing.  Links  Notes and resources at&amp;amp;nbsp;ocdevel.com/mlg/mla-27  Try a walking desk&amp;amp;nbsp;- stay healthy &amp;amp;amp; sharp while you learn &amp;amp;amp; code  Generate a podcast - use my voice to listen to any AI generated content you want  AI Audio Tool Selection  Music:&amp;amp;nbsp;Use&amp;amp;nbsp;Suno&amp;amp;nbsp;for complete songs or&amp;amp;nbsp;Udio&amp;amp;nbsp;for high-quality components for professional editing. Sound Effects:&amp;amp;nbsp;Use&amp;amp;nbsp;ElevenLabs' SFX&amp;amp;nbsp;for integrated podcast production or&amp;amp;nbsp;SFX Engine&amp;amp;nbsp;for large, licensed asset libraries for games and film. Voice:&amp;amp;nbsp;ElevenLabs&amp;amp;nbsp;gives the most realistic voice output.&amp;amp;nbsp;Murf.ai&amp;amp;nbsp;offers an all-in-one studio for marketing, and&amp;amp;nbsp;Play.ht&amp;amp;nbsp;has a low-latency API for developers. Open-Source TTS:&amp;amp;nbsp;For local use,&amp;amp;nbsp;StyleTTS 2&amp;amp;nbsp;generates human-level speech,&amp;amp;nbsp;Coqui's XTTS-v2&amp;amp;nbsp;is best for voice cloning from minimal input, and&amp;amp;nbsp;Piper TTS&amp;amp;nbsp;is a fast, CPU-friendly option.  I. Prosumer Workflow: Viral Video Goal:&amp;amp;nbsp;Rapidly produce branded, short-form video for social media. This method bypasses Veo 3's weaker native &amp;quot;Extend&amp;quot; feature.  Toolchain  Image Concept:&amp;amp;nbsp;GPT-4o (API: GPT-Image-1) for its strong prompt adherence, text rendering, and conversational refinement. Video Generation:&amp;amp;nbsp;Google Veo 3 for high single-shot quality and integrated ambient audio. Soundtrack:&amp;amp;nbsp;Udio for creating unique, &amp;quot;viral-style&amp;quot; music. Assembly:&amp;amp;nbsp;CapCut for its standard short-form editing features.   Workflow  Create Character Sheet (GPT-4o):&amp;amp;nbsp;Generate a primary character image with a detailed &amp;quot;locking&amp;quot; prompt, then use conversational follow-ups to create variations (poses, expressions) for visual consistency. Generate Video (Veo 3):&amp;amp;nbsp;Use &amp;quot;High-Quality Chaining.&amp;quot;  Clip 1: Generate an 8s clip from a character sheet image. Extract Final Frame: Save the last frame of Clip 1. Clip 2: Use the extracted frame as the image input for the next clip, using a &amp;quot;this then that&amp;quot; prompt to continue the action. Repeat as needed.   Create Music (Udio):&amp;amp;nbsp;Use Manual Mode with structured prompts ([Genre: ...], [Mood: ...]) to generate and extend a music track. Final Edit (CapCut):&amp;amp;nbsp;Assemble clips, layer the Udio track over Veo's ambient audio, add text, and use &amp;quot;Auto Captions.&amp;quot; Export in 9:16.    II. Indie Filmmaker Workflow: Narrative Shorts Goal:&amp;amp;nbsp;Create cinematic short films with consistent characters and storytelling focus, using a hybrid of specialized tools.  Toolchain  Visual Foundation:&amp;amp;nbsp;Midjourney V7 to establish character and style with&amp;amp;nbsp;--cref&amp;amp;nbsp;and&amp;amp;nbsp;--sref&amp;amp;nbsp;parameters. Dialogue Scenes:&amp;amp;nbsp;Kling for its superior lip-sync and character realism. B-Roll/Action:&amp;amp;nbsp;Runway Gen-4 for its Director Mode camera controls and Multi-Motion Brush. Voice Generation:&amp;amp;nbsp;ElevenLabs for emotive, high-fidelity voices. Edit &amp;amp;amp; Color:&amp;amp;nbsp;DaVinci Resolve for its integrated edit, color, and VFX suite and favorable cost model.   Workflow  Create Visual Foundation (Midjourney V7):&amp;amp;nbsp;Generate a &amp;quot;hero&amp;quot; character image. Use its URL with&amp;amp;nbsp;--cref --cw 100&amp;amp;nbsp;to create consistent character poses and with&amp;amp;nbsp;--sref&amp;amp;nbsp;to replicate the visual style in other shots. Assemble a reference set. Create Dialogue Scenes (ElevenLabs -&amp;amp;gt; Kling):  Generate the dialogue track in ElevenLabs and download the audio. In Kling, generate a video of the character from a reference image with their mouth closed. Use Kling's &amp;quot;Lip Sync&amp;quot; feature to apply the ElevenLabs audio to the neutral video for a perfect match.   Create B-Roll (Runway Gen-4):&amp;amp;nbsp;Use reference images from Midjourney. Apply precise camera moves with Director Mode or add localized, layered motion to static scenes with the Multi-Motion Brush. Assemble &amp;amp;amp; Grade (DaVinci Resolve):&amp;amp;nbsp;Edit clips and audio on the Edit page. On the Color page, use node-based tools to match shots from Kling and Runway, then apply a final creative look.    III. Professional Studio Workflow: Full Control Goal:&amp;amp;nbsp;Achieve absolute pixel-level control, actor likeness, and integration into standard VFX pipelines using an open-source, modular approach.  Toolchain  Core Engine:&amp;amp;nbsp;ComfyUI with Stable Diffusion models (e.g., SD3, FLUX). VFX Compositing:&amp;amp;nbsp;DaVinci Resolve (Fusion page) for node-based, multi-layer EXR compositing.   Control Stack &amp;amp;amp; Workflow  Train Character LoRA:&amp;amp;nbsp;Train a custom LoRA on a 15-30 image dataset of the actor in ComfyUI to ensure true likeness. Build ComfyUI Node Graph:&amp;amp;nbsp;Construct a generation pipeline in this order:  Loaders: Load base model, custom character LoRA, and text prompts (with LoRA trigger word). ControlNet Stack: Chain multiple ControlNets to define structure (e.g., OpenPose for skeleton, Depth map for 3D layout). IPAdapter-FaceID: Use the Plus v2 model as a final reinforcement layer to lock facial identity before animation. AnimateDiff: Apply deterministic camera motion using Motion LoRAs (e.g.,&amp;amp;nbsp;v2_lora_PanLeft.ckpt). KSampler -&amp;amp;gt; VAE Decode: Generate the image sequence.   Export Multi-Layer EXR:&amp;amp;nbsp;Use a node like&amp;amp;nbsp;mrv2SaveEXRImage&amp;amp;nbsp;to save the output as an EXR sequence (.exr). Configure for a professional pipeline: 32-bit float, linear color space, and PIZ/ZIP lossless compression. This preserves render passes (diffuse, specular, mattes) in a single file. Composite in Fusion: In DaVinci Resolve, import the EXR sequence. Use Fusion's node graph to access individual layers, allowing separate adjustments to elements like color, highlights, and masks before integrating the AI asset into a final shot with a background plate.     </description>
  <author_name>Machine Learning Guide</author_name>
  <author_url>https://ocdevel.com/mlg</author_url>
  <html>&lt;iframe title="Libsyn Player" style="border: none" src="//html5-player.libsyn.com/embed/episode/id/37396195/height/90/theme/custom/thumbnail/yes/direction/forward/render-playlist/no/custom-color/88AA3C/" height="90" width="600" scrolling="no"  allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen&gt;&lt;/iframe&gt;</html>
  <thumbnail_url>https://assets.libsyn.com/secure/item/37396195</thumbnail_url>
</oembed>
