I have a pretty simple use case where I'

At a glance

I have a pretty simple use case where I'm trying to stream back results with FastAPI. When I log the token over the response iterator, I see each token being logged in the console, but I'm not seeing the streamed results. Anyone see an issue I'm missing?

Plain Text

async def response_streamer(response):
    for token in response:
        logger.info(token)
        yield f"{token}"

class ChatInput(BaseModel):
    query_text: str

@app.post("/chat")
async def query_index(chat_input: ChatInput):
    global index

    chat_engine = index.as_chat_engine(
        chat_mode="condense_question",
        verbose=True,
        llm=Settings.llm,
    )

    streaming_response = chat_engine.stream_chat(chat_input.query_text)
    return StreamingResponse(
        response_streamer(streaming_response.response_gen),
        media_type="text/event-stream",
        status_code=200,
    )

3 comments

OOrion Pax

If it helps, the StreamingResponse is from fastapi. I know there's one for llama index too, but StackOverflow suggested I should use the one from fastapi

WWhiteFang_Jr

You'll need to pass streaming=True when you are creating the chat engine as well.
https://docs.llamaindex.ai/en/stable/examples/customization/streaming/chat_engine_condense_question_stream_response/?h=streaming

OOrion Pax

I tried that. It didn't cause post man to produce an event per token

Add a reply

Find answers from the community

I have a pretty simple use case where I'