I built an Opus Clip-style video editor today. The picture is the current interface. I call this up with a video editing skill - /video-edit, and it pops up.
For this video I had it make it square from 9x16, and had it add the captions. The flow is - get a timestamped transcript of the video - add it to the editor - take out what you don't want - send back to cc and have it rebuild from the revised transcript.
It is a little rough in spots, but all together it can be done in a few minutes. And if you want, you can ask cc to edit it too.
The video talks about how I lost my server one day into my trip to Jamaica and how I rebuilt it on the beach (yay).
Honestly, if you think it, you can do it.