The community member is building a FastAPI app that uses the llama-index library to return a StreamingResponse. They are encountering an issue where llm.stream_complete(prompt) is returning a CompletionResponse object that does not have an encode attribute, causing an error. However, using a query engine with response_stream = query_engine.query(query_text) is working fine.
The community member has realized that llm.stream_complete returns a CompletionResponseGen generator, while response_synthesizer.synthesize()[llm_predictor.stream()] returns a TokenGen generator. This difference in the generator types is causing the encoding issue.
A community member has suggested modifying the CompletionResponseGen to be a token generator by creating a token_gen function that yields the delta attribute of each CompletionResponse in the generator.
There is no explicitly marked answer in the comments.
So in my llama-index FastAPI app I have built an API which is returning the StreamingResponse type from FastAPI. I am doing :- response = llm.stream_complete(prompt) return StreamingResponse(response) This is returning the error - AttributeError: 'CompletionResponse' object has no attribute 'encode’. However, returning the streaming from a query engine is working completely fine. For example this code is running completely fine :- response_stream = query_engine.query(query_text) return StreamingResponse(response_stream.response_gen) (I have used streaming=True when creating the response_synthesizer object) Going through the llama-index code I realised “llm.stream_complete” is returning a CompletionResponseGen generator which is initialised as CompletionResponseGen = Generator[CompletionResponse, None, None]
and response_synthesizer.synthesize()[llm_predictor.stream()] is returning TokenGen generator which is initialised as TokenGen = Generator[str, None, None].
Hence the encoding issue. How do I fix this? Is this a llama-index limitation? Should I change the implementation itself and not use llm.stream_complete()? If not, which query engine to use - I am not creating an index here - I am just running a query against a text!!