Does anyone know how i can get the last token of stream_chat()?
E.g i am sending the results of stream_chat() as a json over FastAPI to a react frontend, and would like to have a flag for is_done. However, when checking for the is_done flag in the StreamingChatResposne, the flag is set to True when it is NOT the last token, while i am still iterating and sending the response . I'm guessing that this is because of the lag between the time when Ollama finishes it's response and sets the flag to when i am actually checking the flag.
Is there anyway i can check for the last token/response generated?
code extracts as follows:
async def astreamer(response,model_used):
try:
for i in response.response_gen:
if response._is_done:
print("IS DONE!")
else:
print("IS NOT DONE!")
yield json.dumps(i)
create_json_response()
await asyncio.sleep(.1)
except asyncio.CancelledError as e:
print('cancelled')
@app.post("/chat")
async def chat(request:Request):
...
response = chat_engine_dict["engine"].stream_chat(query)
return StreamingResponse(astreamer(response,model_used=model_used),media_type="text/event-stream")