Find answers from the community

Updated 2 years ago

Streaming

At a glance
I am running into problems when trying to work with streamed query response. everything works correctly when using the following code and flask development server:

Plain Text
response = query_engine.query(question)
full_answer = ""
for token in response.response_gen:
    full_answer = full_answer + token
    emit("answer", {"token": token})


however, when using gunicorn and eventlet or gevent worker (need to use one of those because I want to use websockets to be able to stream the response to client), the code hangs at the for loop line and no iteration of the loop is executed. I assume the code needs to written differently to work with the gunicorn workers, does anybody have any experience with this?
L
L
8 comments
I haven't used gunicorn, but I know internally we used fastapi for streaming and it worked fine πŸ˜…
to make sure I am not doing anything wrong, I just tried streaming directly with the OpenAI library in the same environment (as per https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb) and it worked
hm... I am pretty new to Python, so perhaps using gunicorn + flask + flask socketio was not the best choice πŸ˜„ will try fastapi
I wonder if it's related to the threading in gunicorn conflicting with something in llama-index? hmm
I think so too - both main async gunicorn workers, eventlet and gevent are using monkey patching to make everything work and based on some reading I did, it's all very fragile... fastapi looks like much better option, so I will try rewriting our app into that to see if it helps (but hopefully it will, since it worked for you :D)
haha let me know how it goes! I used to use flask a lot too, fastapi is pretty similar but better built/supported, hope it works well πŸ™ πŸ‘
so, just to confirm - with fastapi everything works as expected, thanks for the tip!
ayyy nice! πŸ’ͺ
Add a reply
Sign up and join the conversation on Discord