Find answers from the community

K
Kevboi
Offline, last seen 4 months ago
Joined September 25, 2024
Hello everyone,

I'm attempting to stream responses using the following code:

def generate_response(retrieved_nodes, query_str, qa_prompt):
logging.info('Generating response stream...')
llm = LiteLLM("gpt-3.5-turbo-16k")
context_str = "\n\n".join([r.get_content() for r in retrieved_nodes])
fmt_qa_prompt = qa_prompt.format(
context_str=context_str,
query_str=query_str
)
response = llm.stream_complete(fmt_qa_prompt)
logging.info('Successfully generated response stream')
for r in response:
print(r.delta, end="")

However, the current setup streams the response line by line. How can I modify it to stream word by word?
I couldn't find relevant information in the documentation.
7 comments
L
K