Find answers from the community

Updated 4 months ago

Querying

At a glance
Hey all, I think I have a conceptual blocker. So what I want to do is not use open ai as my llm but instead use a model hosted on my own cloud that is pretty decent. So when I use a local index i think everything is hunky dory.

What I am trying to do: I have a pinecone db with all my embeddings, my expectation is to use OpenAI embeddings, but switch out the model for my custom llm with a custom llm class, and be able to query them in a similar way that I have been when OpenAI was my llm predictor.

What happens: When I try to now query with GPTPineconeIndex it only sifts through one embedding with minimal data. Is that normal?

My understanding: Llama index retrieves my embeddings, it finds the embedding match closest to the query, the query question and the embedding context is then sent to the LLM, and then the LLM creates an answer and sends it back.

Please let me know where I am off base
L
a
5 comments
That sounds about right. But the default setting only fetches the top 1 closest matching text node and sends it to the LLM

Did you set similarity_top_k in the query?
@Logan M I didn’t! Or I don’t think so
Right! So in your query, once you have a lot of documents, you'll probably want to set something like

index.query(..., similarity_top_k=4, response_mode="compact")

The response mode there will just help speed up response times, I pretty much always use it once I increase the top k
Oh let me try
Add a reply
Sign up and join the conversation on Discord