Hey all, I think I have a conceptual blocker. So what I want to do is not use open ai as my llm but instead use a model hosted on my own cloud that is pretty decent. So when I use a local index i think everything is hunky dory.
What I am trying to do: I have a pinecone db with all my embeddings, my expectation is to use OpenAI embeddings, but switch out the model for my custom llm with a custom llm class, and be able to query them in a similar way that I have been when OpenAI was my llm predictor.
What happens: When I try to now query with GPTPineconeIndex it only sifts through one embedding with minimal data. Is that normal?
My understanding: Llama index retrieves my embeddings, it finds the embedding match closest to the query, the query question and the embedding context is then sent to the LLM, and then the LLM creates an answer and sends it back.