Find answers from the community

Updated 2 months ago

Hello,

Hello,
any way to make a streaming-response with streamlit + llamaindex instead of let the user wait with the "Thinking..." spinner?

I found nothing. Just found option streaming=True. But am not sure if you can use this with streamlit
L
1 comment
Plain Text
if prompt := st.chat_input("What is up?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message(
        "user",
    ):
        st.markdown(prompt)

    with st.chat_message(
        "assistant",
    ):
        message_placeholder = st.empty()
        full_response = ""

        # Convert st.session_state.messages to the expected format
        chat_history = [
            ChatMessage(role=m["role"], content=m["content"]) for m in st.session_state.messages
        ]

        # Use the 'chat' method for generating responses
        streaming_response = st.session_state.chat_engine.stream_chat(
            prompt, chat_history=chat_history
        )

        for token in streaming_response.response_gen:
            full_response += token
            message_placeholder.markdown(full_response)

        st.session_state.messages.append(
            {"role": "assistant", "content": full_response}
        )


Heres an example with a chat engine. It will be similar with a query engine, except youd do query_engine = index.as_query_engine(streaming=True, ...) and call .query(), and it will have the same iterator
Add a reply
Sign up and join the conversation on Discord