I got ChatGPT to write a Python script that lets you upload a video and then it:
1. Transcribes the audio
2. Extracts the image frames (at 1 fps)
3. Describes each image frame
4. Analyses both the audio transcript and the image frames
Nothing revolutionary but considering that I have 0 coding experience I was pretty happy with the end result 😛
You can find the Colab file here. You can run it after making a copy of it. Hopefully some of you will find it helpful. Let me know what you think!