The community member is asking why they hit the model's maximum context length when using LangChain for QA based on a Pinecone vector store, while they don't encounter this issue when using GPT Index for querying the same vector store. The comments explain that Llama Index, which is part of GPT Index, ensures that each call to the language model is constrained by the model's maximum context length, and it does this by breaking long inputs into multiple chunks and refining the answer across those chunks. The community members are building a QA bot where users can select different options, including LangChain and Llama Index, and they are finding that they encounter token limit issues with LangChain calls, so they are using Llama Index for search and find tasks where they don't need to worry about summarization and handling larger chunks.
Guys a question: why when I use langchain for QA based on a Pinecone vectorstore I hit immediately the model's maximum context length and why when I use gpt_index for querying the same vector_store this doesn't happen? Which kind of magic does gpt_index use? ahahah
@Logan M thanks a lot for such clarity. We are building qa bot where users can select gpt3. 5 , llama index pinecone query, llama index vector direct query and langchain vector db chain direct query.
We end up with token limit for langchain calls.
We are working on langchain for chat bot and agent as llama for search n find. As we don't need to worry about summarization and handling bigger chunks.