
The Last Stand Against Ovi
All clips were created by Ovi, using only text or text+image as inputs. Video resized down to 480p to save space. Please turn on the sound for watching.
Generating videos with high quality audios that perfectly match character identity, gender, emotions, pauses, and context
Achieving precise lip synchronization without explicit face bounding boxes, through pure data-driven learning
Naturally extending to realistic multiple speakers and multi-turn conversations, making complex dialogue scenarios possible
Creating synchronized background music and sound effects that match visual actions
We are excited to release our full pre-trained model weights and inference code to expedite video+audio generation in OSS community.
The reference images are sourced from public domains or generated by AI models, and are intended solely to demonstrate the capabilities of this research. If there are any concerns, please contact us (weiminwang@character.ai) and we will delete them.