The community member is using a custom LLM (Large Language Model) class and wants to send out a streaming response. They have a custom LLM that is hosted and an API endpoint that they need to call using Python's requests library, and they want to return a generator response for streaming. The community member is using Flask and is looking for examples or documentation on how to achieve this.
In the comments, another community member suggests that the original poster will need to implement a custom LLM class, and provides a link to the relevant documentation.
Hi Team, I’m using the Custom llm class and want to send out a streaming response. Basically we have a custom llm that hosted and I have an API endpoint that I need to call (Using requests in Python) and return a generator response for streaming. It’s a flask endpoint. Do we have any examples/ documentation for this? Thanks.