Find answers from the community

Updated 6 months ago

Chat engine issue

At a glance
The community member is building a FastAPI app that uses the llama-index library to return a StreamingResponse. They are encountering an issue where llm.stream_complete(prompt) is returning a CompletionResponse object that does not have an encode attribute, causing an error. However, using a query engine with response_stream = query_engine.query(query_text) is working fine. The community member has realized that llm.stream_complete returns a CompletionResponseGen generator, while response_synthesizer.synthesize()[llm_predictor.stream()] returns a TokenGen generator. This difference in the generator types is causing the encoding issue. A community member has suggested modifying the CompletionResponseGen to be a token generator by creating a token_gen function that yields the delta attribute of each CompletionResponse in the generator. There is no explicitly marked answer in the comments.
So in my llama-index FastAPI app I have built an API which is returning the StreamingResponse type from FastAPI. I am doing :-
response = llm.stream_complete(prompt)
return StreamingResponse(response)
This is returning the error - AttributeError: 'CompletionResponse' object has no attribute 'encode’.
However, returning the streaming from a query engine is working completely fine. For example this code is running completely fine :-
response_stream = query_engine.query(query_text)
return StreamingResponse(response_stream.response_gen)
(I have used streaming=True when creating the response_synthesizer object)
Going through the llama-index code I realised “llm.stream_complete” is returning a CompletionResponseGen generator which is initialised as
CompletionResponseGen = Generator[CompletionResponse, None, None]

and response_synthesizer.synthesize()[llm_predictor.stream()] is returning TokenGen generator which is initialised as TokenGen = Generator[str, None, None].

Hence the encoding issue.
How do I fix this? Is this a llama-index limitation? Should I change the implementation itself and not use llm.stream_complete()? If not, which query engine to use - I am not creating an index here - I am just running a query against a text!!
r
L
2 comments
@Logan M waiting for your input. Please help! 🙂
You can modify the CompletionResponseGen to be a token generator

Plain Text
def token_gen(completion_response_gen):
    for resp in completion_response_gen:
        yeild resp.delta
Add a reply
Sign up and join the conversation on Discord