Find answers from the community

Updated 5 months ago

Hello,

At a glance

The community member is asking if there is a way to make a streaming response with Streamlit and LlamaIndex, instead of having the user wait with the "Thinking..." spinner. They mention finding the streaming=True option, but are unsure if it can be used with Streamlit.

In the comments, another community member provides an example of how to implement a streaming response using a chat engine. The example shows how to use the stream_chat() method to generate a streaming response, and how to display the response incrementally in a Streamlit app. The comment suggests that a similar approach can be used with a query engine, by setting streaming=True when creating the query engine.

Hello,
any way to make a streaming-response with streamlit + llamaindex instead of let the user wait with the "Thinking..." spinner?

I found nothing. Just found option streaming=True. But am not sure if you can use this with streamlit
L
1 comment
Plain Text
if prompt := st.chat_input("What is up?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message(
        "user",
    ):
        st.markdown(prompt)

    with st.chat_message(
        "assistant",
    ):
        message_placeholder = st.empty()
        full_response = ""

        # Convert st.session_state.messages to the expected format
        chat_history = [
            ChatMessage(role=m["role"], content=m["content"]) for m in st.session_state.messages
        ]

        # Use the 'chat' method for generating responses
        streaming_response = st.session_state.chat_engine.stream_chat(
            prompt, chat_history=chat_history
        )

        for token in streaming_response.response_gen:
            full_response += token
            message_placeholder.markdown(full_response)

        st.session_state.messages.append(
            {"role": "assistant", "content": full_response}
        )


Heres an example with a chat engine. It will be similar with a query engine, except youd do query_engine = index.as_query_engine(streaming=True, ...) and call .query(), and it will have the same iterator
Add a reply
Sign up and join the conversation on Discord