Guys a question: why when I use langchain for QA based on a Pinecone vectorstore I hit immediately the model's maximum context length and why when I use gpt_index for querying the same vector_store this doesn't happen? Which kind of magic does gpt_index use? ahahah
@Logan M thanks a lot for such clarity. We are building qa bot where users can select gpt3. 5 , llama index pinecone query, llama index vector direct query and langchain vector db chain direct query.
We end up with token limit for langchain calls.
We are working on langchain for chat bot and agent as llama for search n find. As we don't need to worry about summarization and handling bigger chunks.