I'm attempting to stream responses using the following code:
def generate_response(retrieved_nodes, query_str, qa_prompt): logging.info('Generating response stream...') llm = LiteLLM("gpt-3.5-turbo-16k") context_str = "\n\n".join([r.get_content() for r in retrieved_nodes]) fmt_qa_prompt = qa_prompt.format( context_str=context_str, query_str=query_str ) response = llm.stream_complete(fmt_qa_prompt) logging.info('Successfully generated response stream') for r in response: print(r.delta, end="")
However, the current setup streams the response line by line. How can I modify it to stream word by word? I couldn't find relevant information in the documentation.