Hey there,

GGero

Hey there,
I hope you're doing well. I wanted to touch base regarding a little snag I've hit while working on our app.

Currently, I am working on an application where I have been utilizing the streaming feature. However, I have run into a significant hurdle: I cannot seem to get the streaming functionality to work as expected. Specifically, I am struggling to ensure that data streaming occurs correctly.

I have used other applications with similar versions where the streaming feature has worked flawlessly. However, in this particular application, I am unable to get it to function as expected.

To provide you with more context, I am employing RouterRetriever, a synthesizer, and a RetrieverQueryEngine in my implementation. I have configured the necessary attributes for data streaming (streaming=True), but unfortunately, it has not yielded the desired results. I have even tried the implementation in both Google Colab and Hugging Face, but I continue to face the same issue.

llama-index==0.10.20

Below is how I have implemented the function in my code:

response_synthesizer = get_response_synthesizer(text_qa_template=qa_prompt, streaming=True)

query_engine = RetrieverQueryEngine.from_args(
retriever=custom_retriever,
response_synthesizer=response_synthesizer,
node_postprocessors=[rerank],
streaming=True
)

respuesta = query_engine.query(pregunta)
respuesta.print_response_stream() # No funciona

I have also attempted to implement it this way in Hugging Face:

def responder(pregunta):
respuesta = query_engine.query(pregunta)

partial_message = ""
for text in respuesta.response_gen:
partial_message += text
yield partial_message # and it still doesn't work

I would greatly appreciate it if you could provide me with any guidance or advice on how to resolve this issue. I am willing to provide any additional information you may need to assist me in resolving this situation.

Appreciate your help!

5 comments

LLogan M

I think this is a bug introduced in this specific version. Try updating a bit pip install -U llama-index-core

GGero

Thank you for the suggestion. I tried updating the llama-index-core package as you suggested, using the command pip install -U llama-index-core, but it seems that the current version remains the same. The version I currently have installed is the latest available, which is 0.10.20.post2.

It's worth mentioning that I have a similar application running on 'legacy', and the streaming functionality works perfectly fine there. This time, I attempted to modernize by using the latest versions and incorporating the 'Router'. The process seems to be working as expected, as the nodes are retrieved correctly, and the LLM generates the response. However, the only part that isn't functioning is the streaming aspect.

Do you have any other suggestions to address this issue?

LLogan M

I know the exact bug here, trust. The replacement for callbacks is consuming thes stream

I think .post3 didn't get published

LLogan M

try updating again, just published (it might take a sec to show up on pypi)

GGero

Wow, I tried again today following your recommendation and it worked! Thank you very much for your help.

Add a reply

Find answers from the community

Hey there,