So in my llama-index FastAPI app I have built an API which is returning the StreamingResponse type from FastAPI. I am doing :- response = llm.stream_complete(prompt) return StreamingResponse(response) This is returning the error - AttributeError: 'CompletionResponse' object has no attribute 'encode’. However, returning the streaming from a query engine is working completely fine. For example this code is running completely fine :- response_stream = query_engine.query(query_text) return StreamingResponse(response_stream.response_gen) (I have used streaming=True when creating the response_synthesizer object) Going through the llama-index code I realised “llm.stream_complete” is returning a CompletionResponseGen generator which is initialised as CompletionResponseGen = Generator[CompletionResponse, None, None]
and response_synthesizer.synthesize()[llm_predictor.stream()] is returning TokenGen generator which is initialised as TokenGen = Generator[str, None, None].
Hence the encoding issue. How do I fix this? Is this a llama-index limitation? Should I change the implementation itself and not use llm.stream_complete()? If not, which query engine to use - I am not creating an index here - I am just running a query against a text!!