Find answers from the community

s
F
Y
a
P
Updated last month

Query

Wondering if someone can help me out with the following. I'm using a VectorIndex and querying using a response mode of tree summarize. I'd like to query the index using A, but then use a custom prompt that has context B and a set of instructions C. So I'd like query passing A to get the appropriate nodes, but when generating the response, the context B would be passed but also I can pass C, a custom set of instructions that could be different with each call. I don't need to pass A, I only need that to get the appropriate nodes. Any ideas? Thanks!
L
b
b
23 comments
So basically you want one string to be used for retrieval and another for actually writing the response with the LLM?

You could separate out the retrieval and synthesis steps. But even easier is probably using a query bundle

Plain Text
from llama_index import QueryBundle

query_engine.query(QueryBundle("LLM string", embedding_strs=["retrieval string"]))
Yup, that's it. Thanks! I'll check out the QueryBundle functionality. Haven't looked at that yet.
It's very hidden tbh πŸ˜…
oh .., lol πŸ™‚
@Logan M I was going to go w/ a custom synthezier like we did for my app πŸ˜›
I first tried a custom text_qa_template when querying a vector index in tree summarize mode, but it seems to get ignored. Not sure why yet. I'll have to look at that closer.
I'm using the example found here, where it creates a custom text_qa_template. I'm doing the same, and passing it, but when looking at the event pairs with the LlamaDebugger, I can see it's not using what I pass in ...

https://github.com/jerryjliu/llama_index/blob/3c427fc727eb3127cd2bcd4931f41b72c3d13ea9/docs/examples/customization/prompts/chat_prompts.ipynb#L48
But my example is using a response mode of tree_summarize.
Ahh, I figured out the issue with my code, after debugging through llama indexes code (in factory.py). It appears that if I use tree summarize, I need to pass summary_template, not text_qa_template. Let me give that a try and see if it works. I'd be curious to know any tips/tricks or special setup you might use for debugging? Thanks!

elif response_mode == ResponseMode.TREE_SUMMARIZE:
return TreeSummarize(
service_context=service_context,
summary_template=summary_template,
streaming=streaming,
use_async=use_async,
verbose=verbose,
)
Plain Text
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%d-%b-%y %H:%M:%S', stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
is a nice way to debug
perhaps the docs below should be updated for tree_summarize πŸ™‚ But I'm guessing things are chaning so quickly that it's hard keep up.

https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/query_engine/response_modes.html#response-modes
actually, I think there is a bug ...
@bSharpCyclist yea actually for tree summarize, there's a specific prompt template now
"summary_template=..."
this code below, it doesn't pass summary_template to get_response_syntheizer ...

Plain Text
response_synthesizer = response_synthesizer or get_response_synthesizer(
            service_context=service_context,
            text_qa_template=text_qa_template,
            refine_template=refine_template,
            simple_template=simple_template,
            response_mode=response_mode,
            use_async=use_async,
            streaming=streaming,
        )
That's the default
I tried creating my own and passing it via summary_template, but it doesn't get passed later in retreive_query_engine.py. Im going to hack locally and see if it changes anything ..
Or maybe I should just create my own response_synthesizer ... sorry, I'll stop bugging you guys ...
no worries! It should be getting passed tbh πŸ€” Not 100% what your overall setup looks like at the moment though
I do wonder if this should be supported. I think you could by updating a few lines of code.

Plain Text
query_engine = index.as_query_engine(response_mode="tree_summarize", 
                                     service_context=service_context,
                                     summary_template=custom_chat_template)


I got around it by doing this

Plain Text
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
    summary_template=custom_chat_template,
    service_context=service_context,
)

query_engine = index.as_query_engine(response_synthesizer=response_synthesizer)


Now I can see my custom template being used when doing this ...

Plain Text
event_pairs = llama_debug.get_llm_inputs_outputs()
print(event_pairs[2][0])
Sorry, there was a mistake abouve. The first line should have been

Plain Text
query_engine = index.as_query_engine(response_mode="tree_summarize"
. I was trying to first use it without a response synthesizer.
Add a reply
Sign up and join the conversation on Discord