Find answers from the community

Updated last year

hi trying to get a CustomQueryEngine

At a glance

The community member is trying to create a CustomQueryEngine and make the response streaming capable. They have tried using StreamingResponse but encountered an error when integrating with Chainlit. The community members have discussed using stream_completion_response_to_tokens from the LLM Predictor, but this approach also resulted in an error due to the lack of a response_gen attribute. The community members are looking for a way to convert the result of llm.stream_complete() into a StreamingResponse that is compatible with the Chainlit integration.

Useful resources
hi! trying to get a CustomQueryEngine going, following https://gpt-index.readthedocs.io/en/latest/examples/query_engine/custom_query_engine.html

How do I make the response streaming capable -- does the following look right? :
Plain Text
  
    def custom_query(self, query_str: str):
        logger.info(f"Triggering custom engine for query: {query_str}")
        response_gen = self.llm.stream_complete(
            qa_prompt
        )

        response= StreamingResponse(response_gen)
        return response


however, this creates following error upstream (integration with chainlit).
Plain Text
await response_message.stream_token(token=token)
TypeError: can only concatenate str (not "CompletionResponse") to str


Any help appreciated! thanks
b
J
3 comments
what is StreamingResponse?
Plain Text
def stream_completion_response_to_tokens(
    completion_response_gen: CompletionResponseGen,
) -> TokenGen:
    """Convert a stream completion response to a stream of tokens."""

    def gen() -> TokenGen:
        for response in completion_response_gen:
            yield response.delta or ""

    return gen()


is how it's done in the LLM Predictor
Thanks for taking a look!

I did try the above approach, but get
Plain Text
ttributeError: 'generator' object has no attribute 'response_gen'


StreamingResponse defined in https://github.com/run-llama/llama_index/blob/v0.8.44/llama_index/response/schema.py#L85

It has a response_gen attribute that is needed by chainlit code for streaming tokens on the UI. So looking for how to convert the result of llm.stream_complete() into StreamingResponse
Add a reply
Sign up and join the conversation on Discord