The post asks how to make a streaming response arrive in the front-end in real-time. The comments provide two suggestions:
1. The community member suggests looking at the example from https://github.com/run-llama/sec-insights, which has both a front-end and back-end implementation.
2. Another community member suggests using server-sent events or websockets. They recommend opening a websocket on the back-end with FastAPI and sending the delta created by the .stream_response() generator of the language model.
There is no explicitly marked answer in the comments.
That example makes use of server-sent events. Another approach could be websockets. Open a websocket on the back with FastAPI and send the delta created by the .stream_response() generator of the llm.