LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

create and query a gpt_index using the d...

create and query a gpt_index using the d...

At a glance

The community member is trying to create a GPT index from the technical documentation of GPT-Index and LangChain to query for information. They have shared a code gist and are seeking advice from the community. The main issues they are facing are poor query responses, and they are unsure if it's due to the way the data is structured, the models they are using, or the context and prompt not being good enough.

The community members provide several suggestions, such as reducing the chunk size and increasing the similarity_top_k during the query, trying different language models (LLMs) and embeddings, and considering the tradeoffs between smaller and larger chunks. They also discuss the relative performance of different LLMs and embeddings, with OpenAI performing better than Cohere, and local embeddings performing as well as the more expensive Ada embeddings.

The community members also note that the retrieval model can use simpler/free embeddings, but the actual generation part requires a more powerful LLM. They also suggest that the way the documents are structured can impact the retrieval performance, and breaking them into sections/paragraphs can lead to better results.

There is no explicitly marked answer in the comments, but the community members provide several suggestions and insights that the original poster can try to improve their GPT index and query performance.</

Useful resources

·

I am trying to do something I think is simple. I want to take the technical documentation for gpt_index and langchain, and make a gpt index with them so I can query it for info. here's a gist of my code: https://gist.github.com/mcminis1/a9f18c8ca518ada8e11259e6aeb699d3

2

J

j

M

40 comments

@jerryjliu0 I pinged you over on the mlops slack channel to ask for help. there's no rush here, just looking for advice.

i tried several different index types. so i might be doing something wrong, the data should be structured better, or the models I'm using (all local ones) aren;t actually good enough to do this.

what are the pain points that you're facing?

well, the query responses are very poor

so I don;t know if it's constructing a bad context or the context plus prompt aren;t good, or it's the model not doing a good job using that information

one thing that's general advice if you're using GPTSimpleVectorIndex is to set the chunk size to something smaller (eg. default is 4000 tokens-ish, set chunk_size_limit=512), and then set similarity_top_k during query to something higher than 1.

Plain Text

index = GPTSimpleVectorIndex(docs, ..., chunk_size_limit=512)
index.query(..., similarity_top_k=4)

what llm model are you using?

Starting query: who wrote LangChain?
[query] Total LLM token usage: 256 tokens
[query] Total embedding token usage: 0 tokens

None

response = index.query("Notebook : A notebook walking")
> Starting query: Notebook : A notebook walking
> [query] Total LLM token usage: 261 tokens
> [query] Total embedding token usage: 0 tokens
>>> print(response)
A notebook.

python3 -m manifest.api.app --model_type huggingface --model_name_or_path google/flan-t5-xl --fp16 --device 0

flan-t5-xl

float16 version

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

interesting

i've mostly tested with davinci tbh

and openai embeddings

ah. so then, i should try with those settings and reproduce

yeah see if that works, if not lemme know!

k. thanks. trying to do things locally (free) first 🙂

it does work much better with openAI embeddings and LLM.

I will see if it works OK with openAI LLM and the huggingface embeddings. a lot of the spend is probably the embedding part

the embeddings are reasonably cheap (though of course if you have a lot of data it'll add up). query-time costs is only the LLM call

Hi Jerry, I think chunk_size_limit is not documented for the GPTSimpleVectorIndex

Okay it is here https://gpt-index.readthedocs.io/en/latest/reference/indices.html#gpt_index.indices.base.BaseGPTIndex

KKarsten Lehmann

Is there any better documentation for chunk_size_limit available, e.g. if it's the total length of context sent to the LLM or the max size of each embedding found in the index. 4000 as default sounds like it's the total context lengths, but maybe it's just 4000 because with similarity_top_k=1 only one chunk gets sent to the LLM.

yeah by default we just "stuff" as much text into each chunk that can fit into the total prompt limit, which in the case of davinci it's 4000

isn't there some kind of tradeoff here? smaller chunks mean more targeted doc retrieval and better context in final prompt? (unless too small) And, larger chunks mean fewer embed calls (cheaper) but also larger context chunks and potentially less precise indexing?

yeah there's def a tradeoff! but its more like smaller chunks = cheaper/faster but you lose more context per chunk

smaller chunks means cheaper/faster on the query side. it's more expensive on the embed and index side.

no?

actually, no same number of tokens - openAI would be the same for indexing

cohere would be more expensive

mm yeah you're right in that it's more expensive in storing more embeddings + query time for embeddings

using localembedding, thus far, looks to perform as well as the ada ones. Using "sentence-transformers/all-mpnet-base-v1" since they have a longer embed max length than the default and a better sentence similarity score.

it makes sense that the completion part requires a much better LLM.

also interestingly, the cohere LLM appears to be much worse then the openAI one

Interesting. so the retrieval model can use simpler/free embeddings, but the actual generation part requires an LLM

makes sense

were you able to resolve this? I've also been having issues with getting poor context from the retrieval model

using curie embeddings

how you structure your docs seems to matter for retrieval. i'm breaking things into sections/paragraphs and getting pretty good results for similarity_top_k=4.

Add a reply

Sign up and join the conversation on Discord