Llama Index Secret Sauce

At a glance

The community member is asking why they hit the model's maximum context length when using LangChain for QA based on a Pinecone vector store, while they don't encounter this issue when using GPT Index for querying the same vector store. The comments explain that Llama Index, which is part of GPT Index, ensures that each call to the language model is constrained by the model's maximum context length, and it does this by breaking long inputs into multiple chunks and refining the answer across those chunks. The community members are building a QA bot where users can select different options, including LangChain and Llama Index, and they are finding that they encounter token limit issues with LangChain calls, so they are using Llama Index for search and find tasks where they don't need to worry about summarization and handling larger chunks.

AAndreaSel93

Guys a question: why when I use langchain for QA based on a Pinecone vectorstore I hit immediately the model's maximum context length and why when I use gpt_index for querying the same vector_store this doesn't happen? Which kind of magic does gpt_index use? ahahah

4 comments

LLogan M

Llama Index ensures that every call to the LLM is constrained by the models maximum context length.

The magic is how llama index breaks a long input into multiple chunks and refines an answer across the chunks.

AAndreaSel93

YEP. I cant understand how ppl use langchain without this artifact?!?!

hhaha20jg

Then what do they use it for?

iintvijay

@Logan M thanks a lot for such clarity. We are building qa bot where users can select gpt3. 5 , llama index pinecone query, llama index vector direct query and langchain vector db chain direct query.

We end up with token limit for langchain calls.

We are working on langchain for chat bot and agent as llama for search n find. As we don't need to worry about summarization and handling bigger chunks.

Add a reply

Find answers from the community

Llama Index Secret Sauce