The community member is trying to use a custom language model (LLM) hosted on their own cloud instead of OpenAI's model, while using a Pinecone database for their embeddings. They are encountering an issue where their custom LLM is only retrieving one embedding with minimal data when querying the index.
The comments suggest that the default setting in the query only retrieves the top 1 closest matching text node, and the community member should try setting similarity_top_k to a higher value (e.g., 4) to retrieve more relevant results. Additionally, using response_mode="compact" can help speed up the response times.
There is no explicitly marked answer, but the community members provide suggestions to address the issue.
Hey all, I think I have a conceptual blocker. So what I want to do is not use open ai as my llm but instead use a model hosted on my own cloud that is pretty decent. So when I use a local index i think everything is hunky dory.
What I am trying to do: I have a pinecone db with all my embeddings, my expectation is to use OpenAI embeddings, but switch out the model for my custom llm with a custom llm class, and be able to query them in a similar way that I have been when OpenAI was my llm predictor.
What happens: When I try to now query with GPTPineconeIndex it only sifts through one embedding with minimal data. Is that normal?
My understanding: Llama index retrieves my embeddings, it finds the embedding match closest to the query, the query question and the embedding context is then sent to the LLM, and then the LLM creates an answer and sends it back.