Find answers from the community

Updated 2 months ago

Chat engine issue

So in my llama-index FastAPI app I have built an API which is returning the StreamingResponse type from FastAPI. I am doing :-
response = llm.stream_complete(prompt)
return StreamingResponse(response)
This is returning the error - AttributeError: 'CompletionResponse' object has no attribute 'encode’.
However, returning the streaming from a query engine is working completely fine. For example this code is running completely fine :-
response_stream = query_engine.query(query_text)
return StreamingResponse(response_stream.response_gen)
(I have used streaming=True when creating the response_synthesizer object)
Going through the llama-index code I realised “llm.stream_complete” is returning a CompletionResponseGen generator which is initialised as
CompletionResponseGen = Generator[CompletionResponse, None, None]

and response_synthesizer.synthesize()[llm_predictor.stream()] is returning TokenGen generator which is initialised as TokenGen = Generator[str, None, None].

Hence the encoding issue.
How do I fix this? Is this a llama-index limitation? Should I change the implementation itself and not use llm.stream_complete()? If not, which query engine to use - I am not creating an index here - I am just running a query against a text!!
r
L
2 comments
@Logan M waiting for your input. Please help! 🙂
You can modify the CompletionResponseGen to be a token generator

Plain Text
def token_gen(completion_response_gen):
    for resp in completion_response_gen:
        yeild resp.delta
Add a reply
Sign up and join the conversation on Discord