Find answers from the community

Updated 2 months ago

I have a RAG app that I'd like to expose

I have a RAG app that I'd like to expose an API endpoint which just sends the enriched prompt. I'm assuming that by the point the query_engine has been created, the prompt template has had the context_str and query_str pulled from the index and replaced in the template. Is there any way to get the completed prompt as a string from the query_engine?
L
I
7 comments
hmm, I think the only way to get the prompt (Filled in) is to setup a callback handler to catch the LLM events
Thanks! I'll check the docs for how to set up the callback. It's going to be similar in complexity since it just has to send back the text to a POST request in FastApi πŸ™‚
@Logan M One question - this example seems to also print the completion. I don't need it to actually called the LLM - just fill in the prompt template with the information. Is it any easier to just get the prompt from the query_engine object?
Hm. Easiest solution is probably setting a MockLLM in the service context
Ahh interesting approach! Just to confirm then--the prompt template is only filled in when the query function is called correct? The query_engine doesn't fill it in first right?

So I need to make a MockLLM that returns nothing, and use a callback to return the filled-in prompt?
Yea! We actually have a mock llm built in. Its activated when you set the llm to None

Plain Text
serivce_context = ServiceContext.from_defaults(llm=None)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)


Technically, the prompts are formatted inside the response synthesizer, which is inside the query engine πŸ‘
Add a reply
Sign up and join the conversation on Discord