u seem to be playing on both fields (Langchain and Llama Index). I'm trying to stream output from an agent that uses ChatOpenAI as an LLM , and send it back as a Stream from FastAPI. There's no guide on how to do this. I've found some trails on GitHub on ways to do it with Streamlit but none so with FastAPI.
https://github.com/hwchase17/chat-langchain/issues/39I suppose most ppl are building with FastAPI - LangChain - Next.js stack. Have u come across anything similar. I'm having to dig into the Langchain codebase, any pointers will be helpful on streaming.
Not sure, its also throwing
from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler
llm = ChatOpenAI(
model='gpt-3.5-turbo',
streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0
)
raise OutputParserException(f"Could not parse LLM output: `{text}`")
It seems like I need to pass in prefix_tokens, but there's literally no elaborate guide on how to do deal with streaming.