qa_prompt = PromptTemplate(

At a glance

The post describes a RAGStringQueryEngine class that is used for querying a knowledge base. The class has a custom_query method that retrieves relevant nodes, concatenates their content, and uses an LLM to generate a response. However, the community members note that the response synthesizer is not being used, and suggest using llm.stream_complete instead of llm.complete to enable streaming the response. They also provide an example of how to wrap the streaming response in the expected response object.

JJoey

qa_prompt = PromptTemplate(
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)

class RAGStringQueryEngine(CustomQueryEngine):
"""RAG String Query Engine."""

retriever: BaseRetriever
response_synthesizer: BaseSynthesizer
llm: OpenAI
qa_prompt: PromptTemplate

def custom_query(self, query_str: str):
nodes = self.retriever.retrieve(query_str)

context_str = "\n\n".join([n.node.get_content() for n in nodes])
response = self.llm.complete(
qa_prompt.format(context_str=context_str, query_str=query_str)
)

return str(response)

configure retriever

retriever = VectorIndexRetriever(
index=index,
similarity_top_k=2,
)

configure response synthesizer

response_synthesizer = get_response_synthesizer(
streaming=True,
response_mode="tree_summarize",
)
llm = OpenAI(model="gpt-3.5-turbo")

assemble query engine

query_engine = RAGStringQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
llm=llm,
qa_prompt=qa_prompt,
)

query

start_time = time.perf_counter()
streaming_response = query_engine.query('''''')
elapsed_time = time.perf_counter() - start_time

print(f"{elapsed_time:0.3f}s")

Or iterate over the tokens as they arrive

for text in streaming_response.response_gen:
print(text, end="")

I am unable to stream response here

3 comments

LLogan M

You aren't using the response synthesizer in RAGStringQueryEngine , you are just calling llm.complete

LLogan M

Probably you should call llm.stream_complete and wrap that in the expected response object

LLogan M

Plain Text

from llama_index.response.schema import StreamingResponse

response = llm.stream_complete(...)

# source nodes and metadata optional
response_obj = StreamingResponse(response, source_nodes=[], metadata={})
return response_obj

or if you actually use the response synthesize you passed in

Plain Text

return self.response_synthesizer.syntheszie(query_str, nodes)

Add a reply

Find answers from the community

qa_prompt = PromptTemplate(

configure retriever

configure response synthesizer

assemble query engine

query

Or iterate over the tokens as they arrive