Anyone have any information about the

At a glance

The community members are discussing the performance differences between Flask and FastAPI. They are considering switching from Flask to FastAPI, but have found that FastAPI seems to be slower for a basic request using the chat engine. The community members are using Gunicorn to run the app and are unsure if they are using async methods correctly in FastAPI. They have provided a sample FastAPI app using astream_chat and async_response_gen, but have encountered issues with the streaming not returning anything. The community members have also tried running a single request, but are still seeing the performance difference. One community member suggests that the main advantage of FastAPI is its typed interface validation and automatic swagger docs, rather than performance. Another community member has tried the sample app and found it to be fast, but has encountered a RuntimeError: Event loop is closed error after a few messages. They have resolved this issue by using nest_asyncio.apply(), which is recommended when setting the loop type to asyncio.

MMike

Anyone have any information about the speed differences between Flask and FastAPI? We were thinking about switching from Flask to FastAPI but it seems to be quite a bit slower. A basic request where I use the chat engine seems to be almost twice as slow...

21 comments

MMike

We're using Gunicorn to run the app.

LLogan M

Are you using async in fastapi? Everything in fastapi should be using async methods when possible

MMike

I believe we are, and if we were to do just one request, it shouldn't matter right?

kkevingoed

Bit more background here, when we used astream_chat and async_response_gen the streaming didn't return anything at all.

MMike

We're seeing this even when just a single request is send by 1 person at a time.

LLogan M

I havent compared performance between fast-api and flask myself 🤔 But imo the main advantage of fastapi is typed interface validation and automatic swagger docs

LLogan M

Also for reference, here's a small sample app with astream_chat (this is probably not a good architecture, but just trying to show how it works)

LLogan M

Plain Text

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

from llama_index import SimpleDirectoryReader, VectorStoreIndex


documents = SimpleDirectoryReader("./docs/examples/data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
agent = index.as_chat_engine()

app = FastAPI()
app.agent = agent


@app.get("/chat/")
async def chat(message: str) -> StreamingResponse:
    response = await app.agent.astream_chat(message)

    response_gen = response.async_response_gen()

    return StreamingResponse(response_gen, media_type="text/plain")


@app.get("/")
async def root():
    return {"message": "Hello World"}


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, loop="asyncio", host="127.0.0.1", port=8000)

LLogan M

curl -N http://127.0.0.1:8000/chat/?message=Hello

LLogan M

(tbh it feels very fast to me, but this is also a pretty simple setup)

kkevingoed

@Logan M will try this after dinner 🙂 stay tuned.

kkevingoed

@Logan M just tried it and it seems to be a bit faster but now after 3 messages I run into RuntimeError: Event loop is closed errors 😄

LLogan M

Hmmm usually the event loop closes because of some error earlier on... it's like a swallower of errors 😅

LLogan M

I wonder if it was a request timeout maybe?