Streaming issue with simultaneous requests

nnayan32biswas

Streaming issue for simultaneous request

I was checking this project and found this issue.

When I simultaneously asked two questions I found that the response was shared between the two requests.
Here I have added a screenshot of that.

Is this a LlamaIndex problem or is there any problem in implementing event streaming in this code?
How can I solve the response event sharing issue between two requests?

Attachment

20 comments

LLogan M

Probably an issue with however this was implemented

It's totally possible to stream multiple requests 🤷

LLogan M

This project is a little old -- this isn't how I would implement streaming these days 👀

LLogan M

This is more recent from Rohan, using workflows, which is how I would do this

https://github.com/rsrohan99/llamaindex-workflow-streaming-tutorial

If you haven't used workflows yet, they are super cool. Docs here
https://docs.llamaindex.ai/en/stable/module_guides/workflow/

nnayan32biswas

@Logan M thanks for your feedback.

For now, It would be great for me if I could solve that issue without workflow. I have tried with this doc. It didn't work for me because BaseEventHandler is thread-locked.
In that case, do you have any suggestions or doc that I can go through?
Thanks.

LLogan M

The instrumentation event handler would be the safe way to do it -- it's threads are and async safe

LLogan M

What was the exact issue?

LLogan M

Not sure what the issue with thread locking was 🤔

nnayan32biswas

First I used Rohan's code, which did not work for two simultaneous requests because it globally shares the queue data.

Then I changed the code a bit. I initiate the CustomEventHandler object inside the request function, create a new queue for each user request, and collect the event from the CustomEventHandler object defined inside the request function.
But a new issue arrived. The llamaindex internal list event never got cleared. Now, I am getting old events, including new events, unless the server restarts.
In the attached screenshot you will notice that each event emits 5 times because I was asked 5 questions without restarting the server. Asking new questions will repeat each event for 6 times. That is my issue.

I have added the updated code file based on Rohan's implementation.

Attachment

nnayan32biswas

Also updating the package to 0.11.14 didn't help.

LLogan M

Hmm. Tbh it's probably going to be 100% less work to use a workflow lol

LLogan M

I might rewrite it later today if I get some free time

LLogan M

Just for you: https://github.com/rsrohan99/rag-stream-intermediate-events-tutorial/pull/4

https://github.com/logan-markewich/rag-stream-intermediate-events-tutorial/tree/master

LLogan M

@nayan32biswas ^ converted to use workflows

LLogan M

Was a fun little exercise

nnayan32biswas

Thank you @Logan M 🥰

LLcoder

Hi @Logan M it seems like there is an issue with workflow event streaming when we are trying to use it with ContextChatEngine

LLcoder

My code

LLcoder

this is the error

Attachment

LLogan M

That workflow itself IS a context chat engine fyi 😅

Anyways, yea since you aren't using an llm directly, the streaming changes slightly

Should be

Plain Text

response = await chat_engine.astream_chat(...)
async for delta in response.async_response_gen():
  ...

Where now you are iterating over purely the new chunks/deltas that are being streamed in

LLcoder

Thank you @Logan M

Add a reply

Find answers from the community

Streaming issue with simultaneous requests