hi trying to get a CustomQueryEngine

At a glance

The community member is trying to create a CustomQueryEngine and make the response streaming capable. They have tried using StreamingResponse but encountered an error when integrating with Chainlit. The community members have discussed using stream_completion_response_to_tokens from the LLM Predictor, but this approach also resulted in an error due to the lack of a response_gen attribute. The community members are looking for a way to convert the result of llm.stream_complete() into a StreamingResponse that is compatible with the Chainlit integration.

Useful resources

JJedi

hi! trying to get a CustomQueryEngine going, following https://gpt-index.readthedocs.io/en/latest/examples/query_engine/custom_query_engine.html

How do I make the response streaming capable -- does the following look right? :

Plain Text

  
    def custom_query(self, query_str: str):
        logger.info(f"Triggering custom engine for query: {query_str}")
        response_gen = self.llm.stream_complete(
            qa_prompt
        )

        response= StreamingResponse(response_gen)
        return response

however, this creates following error upstream (integration with chainlit).

Plain Text

await response_message.stream_token(token=token)
TypeError: can only concatenate str (not "CompletionResponse") to str

Any help appreciated! thanks

3 comments

bbmax

what is StreamingResponse?

bbmax

Plain Text

def stream_completion_response_to_tokens(
    completion_response_gen: CompletionResponseGen,
) -> TokenGen:
    """Convert a stream completion response to a stream of tokens."""

    def gen() -> TokenGen:
        for response in completion_response_gen:
            yield response.delta or ""

    return gen()

is how it's done in the LLM Predictor

JJedi

Thanks for taking a look!

I did try the above approach, but get

Plain Text

ttributeError: 'generator' object has no attribute 'response_gen'

StreamingResponse defined in https://github.com/run-llama/llama_index/blob/v0.8.44/llama_index/response/schema.py#L85

It has a response_gen attribute that is needed by chainlit code for streaming tokens on the UI. So looking for how to convert the result of llm.stream_complete() into StreamingResponse

Add a reply

Find answers from the community

hi trying to get a CustomQueryEngine