{"version":1,"type":"rich","provider_name":"Libsyn","provider_url":"https:\/\/www.libsyn.com","height":90,"width":600,"title":"MLA 027 AI Video End-to-End Workflow","description":"How to maintain character consistency, style consistency, etc in an AI video. Prosumers can use Google Veo 3\u2019s &quot;High-Quality Chaining&quot; for fast social media content. Indie filmmakers can achieve narrative consistency by combining Midjourney V7 for style, Kling for lip-synced dialogue, and Runway Gen-4 for camera control, while professional studios gain full control with a layered ComfyUI pipeline to output multi-layer EXR files for standard VFX compositing.  Links  Notes and resources at&amp;nbsp;ocdevel.com\/mlg\/mla-27  Try a walking desk&amp;nbsp;- stay healthy &amp;amp; sharp while you learn &amp;amp; code  Generate a podcast - use my voice to listen to any AI generated content you want  AI Audio Tool Selection  Music:&amp;nbsp;Use&amp;nbsp;Suno&amp;nbsp;for complete songs or&amp;nbsp;Udio&amp;nbsp;for high-quality components for professional editing. Sound Effects:&amp;nbsp;Use&amp;nbsp;ElevenLabs' SFX&amp;nbsp;for integrated podcast production or&amp;nbsp;SFX Engine&amp;nbsp;for large, licensed asset libraries for games and film. Voice:&amp;nbsp;ElevenLabs&amp;nbsp;gives the most realistic voice output.&amp;nbsp;Murf.ai&amp;nbsp;offers an all-in-one studio for marketing, and&amp;nbsp;Play.ht&amp;nbsp;has a low-latency API for developers. Open-Source TTS:&amp;nbsp;For local use,&amp;nbsp;StyleTTS 2&amp;nbsp;generates human-level speech,&amp;nbsp;Coqui's XTTS-v2&amp;nbsp;is best for voice cloning from minimal input, and&amp;nbsp;Piper TTS&amp;nbsp;is a fast, CPU-friendly option.  I. Prosumer Workflow: Viral Video Goal:&amp;nbsp;Rapidly produce branded, short-form video for social media. This method bypasses Veo 3's weaker native &quot;Extend&quot; feature.  Toolchain  Image Concept:&amp;nbsp;GPT-4o (API: GPT-Image-1) for its strong prompt adherence, text rendering, and conversational refinement. Video Generation:&amp;nbsp;Google Veo 3 for high single-shot quality and integrated ambient audio. Soundtrack:&amp;nbsp;Udio for creating unique, &quot;viral-style&quot; music. Assembly:&amp;nbsp;CapCut for its standard short-form editing features.   Workflow  Create Character Sheet (GPT-4o):&amp;nbsp;Generate a primary character image with a detailed &quot;locking&quot; prompt, then use conversational follow-ups to create variations (poses, expressions) for visual consistency. Generate Video (Veo 3):&amp;nbsp;Use &quot;High-Quality Chaining.&quot;  Clip 1: Generate an 8s clip from a character sheet image. Extract Final Frame: Save the last frame of Clip 1. Clip 2: Use the extracted frame as the image input for the next clip, using a &quot;this then that&quot; prompt to continue the action. Repeat as needed.   Create Music (Udio):&amp;nbsp;Use Manual Mode with structured prompts ([Genre: ...], [Mood: ...]) to generate and extend a music track. Final Edit (CapCut):&amp;nbsp;Assemble clips, layer the Udio track over Veo's ambient audio, add text, and use &quot;Auto Captions.&quot; Export in 9:16.    II. Indie Filmmaker Workflow: Narrative Shorts Goal:&amp;nbsp;Create cinematic short films with consistent characters and storytelling focus, using a hybrid of specialized tools.  Toolchain  Visual Foundation:&amp;nbsp;Midjourney V7 to establish character and style with&amp;nbsp;--cref&amp;nbsp;and&amp;nbsp;--sref&amp;nbsp;parameters. Dialogue Scenes:&amp;nbsp;Kling for its superior lip-sync and character realism. B-Roll\/Action:&amp;nbsp;Runway Gen-4 for its Director Mode camera controls and Multi-Motion Brush. Voice Generation:&amp;nbsp;ElevenLabs for emotive, high-fidelity voices. Edit &amp;amp; Color:&amp;nbsp;DaVinci Resolve for its integrated edit, color, and VFX suite and favorable cost model.   Workflow  Create Visual Foundation (Midjourney V7):&amp;nbsp;Generate a &quot;hero&quot; character image. Use its URL with&amp;nbsp;--cref --cw 100&amp;nbsp;to create consistent character poses and with&amp;nbsp;--sref&amp;nbsp;to replicate the visual style in other shots. Assemble a reference set. Create Dialogue Scenes (ElevenLabs -&amp;gt; Kling):  Generate the dialogue track in ElevenLabs and download the audio. In Kling, generate a video of the character from a reference image with their mouth closed. Use Kling's &quot;Lip Sync&quot; feature to apply the ElevenLabs audio to the neutral video for a perfect match.   Create B-Roll (Runway Gen-4):&amp;nbsp;Use reference images from Midjourney. Apply precise camera moves with Director Mode or add localized, layered motion to static scenes with the Multi-Motion Brush. Assemble &amp;amp; Grade (DaVinci Resolve):&amp;nbsp;Edit clips and audio on the Edit page. On the Color page, use node-based tools to match shots from Kling and Runway, then apply a final creative look.    III. Professional Studio Workflow: Full Control Goal:&amp;nbsp;Achieve absolute pixel-level control, actor likeness, and integration into standard VFX pipelines using an open-source, modular approach.  Toolchain  Core Engine:&amp;nbsp;ComfyUI with Stable Diffusion models (e.g., SD3, FLUX). VFX Compositing:&amp;nbsp;DaVinci Resolve (Fusion page) for node-based, multi-layer EXR compositing.   Control Stack &amp;amp; Workflow  Train Character LoRA:&amp;nbsp;Train a custom LoRA on a 15-30 image dataset of the actor in ComfyUI to ensure true likeness. Build ComfyUI Node Graph:&amp;nbsp;Construct a generation pipeline in this order:  Loaders: Load base model, custom character LoRA, and text prompts (with LoRA trigger word). ControlNet Stack: Chain multiple ControlNets to define structure (e.g., OpenPose for skeleton, Depth map for 3D layout). IPAdapter-FaceID: Use the Plus v2 model as a final reinforcement layer to lock facial identity before animation. AnimateDiff: Apply deterministic camera motion using Motion LoRAs (e.g.,&amp;nbsp;v2_lora_PanLeft.ckpt). KSampler -&amp;gt; VAE Decode: Generate the image sequence.   Export Multi-Layer EXR:&amp;nbsp;Use a node like&amp;nbsp;mrv2SaveEXRImage&amp;nbsp;to save the output as an EXR sequence (.exr). Configure for a professional pipeline: 32-bit float, linear color space, and PIZ\/ZIP lossless compression. This preserves render passes (diffuse, specular, mattes) in a single file. Composite in Fusion: In DaVinci Resolve, import the EXR sequence. Use Fusion's node graph to access individual layers, allowing separate adjustments to elements like color, highlights, and masks before integrating the AI asset into a final shot with a background plate.     ","author_name":"Machine Learning Guide","author_url":"https:\/\/ocdevel.com\/mlg","html":"<iframe title=\"Libsyn Player\" style=\"border: none\" src=\"\/\/html5-player.libsyn.com\/embed\/episode\/id\/37396195\/height\/90\/theme\/custom\/thumbnail\/yes\/direction\/forward\/render-playlist\/no\/custom-color\/88AA3C\/\" height=\"90\" width=\"600\" scrolling=\"no\"  allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen><\/iframe>","thumbnail_url":"https:\/\/assets.libsyn.com\/secure\/item\/37396195"}