Hello everyone

At a glance

Hello everyone,

I'm attempting to stream responses using the following code:

def generate_response(retrieved_nodes, query_str, qa_prompt):
logging.info('Generating response stream...')
llm = LiteLLM("gpt-3.5-turbo-16k")
context_str = "\n\n".join([r.get_content() for r in retrieved_nodes])
fmt_qa_prompt = qa_prompt.format(
context_str=context_str,
query_str=query_str
)
response = llm.stream_complete(fmt_qa_prompt)
logging.info('Successfully generated response stream')
for r in response:
print(r.delta, end="")

However, the current setup streams the response line by line. How can I modify it to stream word by word?
I couldn't find relevant information in the documentation.

7 comments

LLogan M

I think this is an issue with LiteLLM maybe? If each delta is a full sentence, I'm not really sure 🤔

LLogan M

Does it work fine with the normal OpenAI llm?

KKevboi

Hello Logan,
The result is the same for the normal OpenAI llm

LLogan M

well that sounds a little weird. It streams fine for me 🤔

KKevboi

That’s weird. Any chance I can see your code?

LLogan M

Plain Text

from llama_index.llms import OpenAI, ChatMessage

llm = OpenAI()

resp = llm.stream_chat([ChatMessage(role="user", content="Tell me a story about a rainy day")])

for r in resp:
    print(r.delta, end="", flush=True)

LLogan M

I think you need the flush=True

Add a reply

Find answers from the community

Hello everyone