POC - Youtube Video Transcript Knowledge Base Chatbot
The goal of this project is to develop a system that processes YouTube video transcripts, creates embeddings for smaller document chunks, and facilitates user interaction through a chat interface. The system will enable users to ask questions and receive relevant content along with context from the original video, including the URL to the video and the relevant timestamps. This post will be updated as I work through the concepts and start completing tasks. Update #1: To add a little more clarity, I am architecting this in a way where there are components that will be added to a pipeline, and then the pipeline will be run periodically to check for new videos and get the transcript for them. I also need to add a component that checks for the last video id that was processed and use that id to check if there are latest videos. # Components: ## get_youtube_transcript #### Inputs: ``youtube URL`` #### Outputs: ``transcript`` ``message`` ## clean_transcript #### Inputs: ``transcript`` #### Outputs: ``cleaned transcript`` ``message`` ## save_cleaned_transcript #### Input: ``cleaned transcript`` #### Output: ``message`` ## chunk_transcript #### Inputs: ``cleaned transcript`` #### Outputs: ``chunks`` ``message`` ## create_embeddings #### Inputs: ``chunks`` #### Outputs: ``embeddings`` ``message`` ## save_embeddings #### Inputs: ``embeddings`` #### Outputs: ``message``