Annoyances I'm learning about
Now that I've been working with AI video creation for a month, I'm coming to grips with significant problem areas which the rest of you may also be suffering. I invite comments if you wish. 1) I can't be specific about this, but scene changes often feel abrupt, even jarring. When the camera moves to a different shot, it feels awkward and amateurish when I watch it. I wonder if there are camera change rules that would reduce this problem. 2) OpenArt's Consistent Character 2.0 does an excellent job of creating consistent characters in still images, which may then serve as video start frames or lip-sync scenes. However, most or all image-to-video models are not so good at keeping the character consistent. The character's facial structure may change enough to be noticeable. Sometimes even the clothes change! 3) Dzine's lip sync, which seems to be the only choice when there are several characters in the scene, does a fabulous job with lips, but it often makes crazy changes to the character or scene. I've had it turn a blond's hair to black. I've had it give a clean-shaven man a beard. 4) Text-to-Speech apps such as ElevenLabs change the voice tone and inflection every time you generate. This is good in that it lets you pick the best of several tries. But there is no way to keep the timbre consistent across scenes. I can generate a speech in one scene, then use the same character with the same emotion directives in the next scene, and the voice changes enough that it sounds like a different character. These inconsistencies make the finished process look and sound amateurish. A character who is supposed to be consistent can change in both appearance and speech, sometimes enough to be mildly jarring. It's frustrating to be so close but not quite there yet.