Wondering if someone can help me out with the following. I'm using a VectorIndex and querying using a response mode of tree summarize. I'd like to query the index using A, but then use a custom prompt that has context B and a set of instructions C. So I'd like query passing A to get the appropriate nodes, but when generating the response, the context B would be passed but also I can pass C, a custom set of instructions that could be different with each call. I don't need to pass A, I only need that to get the appropriate nodes. Any ideas? Thanks!
I have a local model working using LllamaCPP. Some questions, I'm assuming I should rebuild my index with the new LLM? The documention mentions that if you want to use local embeddings, to install sentence transformers. If I use OpenAI again later, will it use local embeddings as well?
Suppose I have a vector index, top_k = 5, and I query it. I then do the same query but in response mode tree_summarize. What exactly would be different in that call? Does it try to summarize each node/chunk individually, and merge results at the end?