Find answers from the community

Updated 3 months ago

Streaming

Yes I've tried that and it doesn't work. Here's the code:

Plain Text
@app.route("/query", methods=["GET"])
def query_index():
  global index
  query_text = request.args.get("text", None)
  if query_text is None:
    return "No text found, please include a ?text=blah parameter in the URL", 400
  query_engine = index.as_query_engine(service_context=service_context, streaming=True)

  def response_stream(response):
    def generate():
        for text in response:
            yield text
    return generate

  return stream_with_context(response_stream(query_engine.query(query_text).response_gen))
p
I
L
99 comments
Am I supposed to be using stream_with_context or no?
thats suppose to stream it properly
Where do I put it?
Attachment
image.png
Inside the Response(?
return Response(stream_with_context(response_stream(message)))
Even doing that it still doesn't return word by word
u know what. i justy realized
its not doing that for me either anymore
I wonder if it was an update?
i just moved everything from sagemaker to ec2 instance
it was working fie in sagemaker
wait i think i broke mine cause of the nginx server
i had to setup nginx in order to get the flask server to run
Ah gotcha, does it work now?
i have to fix it in nginx probably
so it is streaming
but first it waits for the whole response stream from the flask and then streams the whole thing but thats pointless
ok i fixed it
its working now 🙂
streams word by word
How did you fix it? Mine is still waiting until the whole response is done
mine was probelm with nginx
proxy_buffering off;
defualt nginx waits fr whoel response
i just turned it off and it worked
Ahh I see, so yours is in deployment? I'm not too familiar with nginx yet
there was no other way to use it lol
i had to do this
or else the moment i get out of ssh, it shuts off lol
i still need to figure out the css problem which im pretty sure is what im having or the js rpobelm
That makes sense. I'm still so confused why mine is just returning the whole thing and not word by word
u have a website too?
Yes but currently I'm just trying to get the backend to work
I'm using next.js for the frontend
for website u need js and setup loop
u need to have a stream reader
But curl is waiting until the whole response is done, it doesn't return it word by word
This is what it looks like
The same thing happens when I just do it in the terminal with curl
wait what r u using?
OpenAI gpt-3.5-turbo
im using custom llm
for open ai, do u have streaming=true?
Ahh I see. And yes I do
yea theres smth u need to do for that to work
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", streaming=True))
its different than a custom model
I thought the LLMPredictor is model agnostic?
So wouldn't the configuration still be the same
@Logan M would love to get your thoughts here if possible. Is there any extra configuration I have to set up to get streaming to work if I'm using gpt-3.5-turbo?
im not using llm predictor
im using HuggingFaceLLMPredictor
Assuming you are on a newer version of llama index, streaming works with gpt-3.5

I've tested it and had it working locally 👍

No extra setup besides setting streaming=True in the ChatOpenAI class and as_query_engine call
Do you happen to have the code? To my knowledge, I've configured everything correctly
One sec, let me see if I have something
Yea i have seen it
If u good streaming for llamaindex theres so many examples
Plain Text
>>> from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, LLMPredictor
>>> documents = SimpleDirectoryReader("./paul_graham").load_data()
>>> from langchain.chat_models import ChatOpenAI
>>> llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0, streaming=True))
>>> service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
>>> index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
>>> response = index.as_query_engine(streaming=True).query("what did the author do growing up?")
>>> response.print_response_stream()
The author wrote short stories and tried programming on an IBM 1401 in 9th grade.
That worked just now
I've tried the ones on the website and even those don't work when I do response.print_response_stream() for example
I'll try running this, thanks!
So weird, for me it doesn't work.. it just waits for the full response
What llama index version do you have?
version 0.6.8
Sheesh lol and it doesn't work even as a standalone script?
oh wait I lied, it's also not working for me... I thought I just looked away and it printed fast because it was short lol
not even text-davinci-003 is streaming... i swear this worked just the other day. It's like it's streaming entire chunks. Just now it printend 3 paragraphs for a response, but waited in between each paragraph
It’s never us always something else
@Logan M OMGGG
I FIXEDDD the streamingggg
its structuringggg
IT WAS CSS after all
Were you able to get it to work?
Yea it works now haha. I think it was because I was running in a python shell rather than a script. Pushed a changed yesterday to help with that too, when using print_response_stream()
just ran now and it streams fine 👀
Just tested it with the new update and it works too, thank you!

For my frontend app I want to return bullet points and list the specific sources (timestamps in a doc) of where it found the info. Originally I was planning on using StructuredLLMPredictor to achieve this and return a JSON object. However you can't stream with StructuredLLMPredictor.

For this kind of use case do you still suggest returning a JSON object or am I able to retrieve the specific sources (timestamps within specific docs) another way?
Wooo glad it works now! 🫡💪

Is the info you need available from response.source_nodes? It can maybe be parsed from that list?
Niceee glad to see it works for u now!
I'm testing that out now.. however, chatGPT just keeps throwing "Not enough context is provided to answer this question" or "As an AI language model I cannot determine.." or some variation of that. I provided a lot of context though— to fix this is it just prompt engineering?
Smells like gpt-3.5 to me lol

Is the answer actually in response.source_nodes? If so, I suspect it's the refine prompt causing issues
Ok just got the prompt to work, unfortunately it doesn't list out all the sources of where it found things. If I'm generating JSON to get info like timestamps then do you suggest using StructuredLLMPredictor?
Add a reply
Sign up and join the conversation on Discord