Find answers from the community

Updated 4 months ago

Hello everyone

At a glance
Hello everyone,

I'm attempting to stream responses using the following code:

def generate_response(retrieved_nodes, query_str, qa_prompt):
logging.info('Generating response stream...')
llm = LiteLLM("gpt-3.5-turbo-16k")
context_str = "\n\n".join([r.get_content() for r in retrieved_nodes])
fmt_qa_prompt = qa_prompt.format(
context_str=context_str,
query_str=query_str
)
response = llm.stream_complete(fmt_qa_prompt)
logging.info('Successfully generated response stream')
for r in response:
print(r.delta, end="")

However, the current setup streams the response line by line. How can I modify it to stream word by word?
I couldn't find relevant information in the documentation.
L
K
7 comments
I think this is an issue with LiteLLM maybe? If each delta is a full sentence, I'm not really sure 🤔
Does it work fine with the normal OpenAI llm?
Hello Logan,
The result is the same for the normal OpenAI llm
well that sounds a little weird. It streams fine for me 🤔
That’s weird. Any chance I can see your code?
Plain Text
from llama_index.llms import OpenAI, ChatMessage

llm = OpenAI()

resp = llm.stream_chat([ChatMessage(role="user", content="Tell me a story about a rainy day")])

for r in resp:
    print(r.delta, end="", flush=True)
I think you need the flush=True
Add a reply
Sign up and join the conversation on Discord