Querying

At a glance

The community member is trying to use a custom language model (LLM) hosted on their own cloud instead of OpenAI's model, while using a Pinecone database for their embeddings. They are encountering an issue where their custom LLM is only retrieving one embedding with minimal data when querying the index.

The comments suggest that the default setting in the query only retrieves the top 1 closest matching text node, and the community member should try setting similarity_top_k to a higher value (e.g., 4) to retrieve more relevant results. Additionally, using response_mode="compact" can help speed up the response times.

There is no explicitly marked answer, but the community members provide suggestions to address the issue.

aandrewv

Hey all, I think I have a conceptual blocker. So what I want to do is not use open ai as my llm but instead use a model hosted on my own cloud that is pretty decent. So when I use a local index i think everything is hunky dory.

What I am trying to do: I have a pinecone db with all my embeddings, my expectation is to use OpenAI embeddings, but switch out the model for my custom llm with a custom llm class, and be able to query them in a similar way that I have been when OpenAI was my llm predictor.

What happens: When I try to now query with GPTPineconeIndex it only sifts through one embedding with minimal data. Is that normal?

My understanding: Llama index retrieves my embeddings, it finds the embedding match closest to the query, the query question and the embedding context is then sent to the LLM, and then the LLM creates an answer and sends it back.

Please let me know where I am off base

5 comments

LLogan M

That sounds about right. But the default setting only fetches the top 1 closest matching text node and sends it to the LLM

Did you set similarity_top_k in the query?

aandrewv

@Logan M I didn’t! Or I don’t think so

LLogan M

Right! So in your query, once you have a lot of documents, you'll probably want to set something like

index.query(..., similarity_top_k=4, response_mode="compact")

The response mode there will just help speed up response times, I pretty much always use it once I increase the top k

aandrewv

Oh let me try

aandrewv

Ty ty

Add a reply

Find answers from the community

Querying