chat with pdfs · Data Alchemy

Elena Guzzon

Apr '24 • 💡 Help

chat with pdfs

Hi all!

I tried to write a Chatbot for asking questions about an upload PDF. If the PDF is too long (es 70 pages) i get this error:

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 28870 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

the errore arises after i have create the FAISS DB and when i ask the first question

I am using Streamlit and lagnchain.

here is my funcition for chunking:

-------

def get_pdf_text(pdf):

pdf_reader = PdfReader(pdf)

text = ""

for page in pdf_reader.pages:

text += page.extract_text()

return text

def get_text_chunks(text):

text_splitter = CharacterTextSplitter(separator="/n")

chunks=text_splitter.split_text(text)

return chunks

def create_db(text, embedding=OpenAIEmbeddings()):

# Convert the document chunks to embedding and save them to the vector store

vectordb1 = FAISS.from_texts(text, embedding)

return vectordb1

-----------

here is my retrieval chain:

def get_conversation_chain(vectorstore):

llm = ChatOpenAI()

# llm = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.5, "max_length":512})

memory = ConversationBufferMemory(

memory_key='chat_history', return_messages=True)

conversation_chain = ConversationalRetrievalChain.from_llm(

llm=llm,

retriever=vectorstore.as_retriever(),

memory=memory

)

return conversation_chain

How can i reduce the number of tokens?

8

5 comments

Data Alchemy

skool.com/data-alchemy

Your Community to Master the Fundamentals of Working with Data and AI — by Datalumina®

📚 Explore More Resources

🔗 Subscribe on YouTube

Leaderboard (30-day)

1

James Brown

+203

2

Pavan Sai

+74

3

Yves Joseph Sikati

Yves Joseph Sikati

+33

4

Surya Narayan

+32

5

Pierre-Henry Isidor

Pierre-Henry Isidor

+28