Find answers from the community

Updated 8 months ago

How do i set the output_cls and

How do i set the output_cls and similarity_top_k with the retry query engine?

Plain Text
# this is what i want but output_cls and similarity_top_k are not accepted as args
base_query_engine = index.as_query_engine(llm=llm, filters=filters)

query_engine_presentation_content = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationContentListV1,
    similarity_top_k=10,
)
query_engine_presentation_outline = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationOutlineV1,
    similarity_top_k=10,
)
L
N
13 comments
I think you'd set all that in base_query_engine ?
Is there no way to share common properties in one (base query engine in this case) and create different variables that hold the properties that should differ?
I'm not sure what you mean?

RetryQueryEngine is just a wrapper on top of an existing query engine

(And a query engine is just a wrapper on top of a retriever and response synthesizer, both of which have settings that change depending on the type of index, retriever, and syntesizer)
In my scenario i want two different query engines because the output class differs (for the rest all args are the same as you can see.

I am looking for a way to structurize my code so i can re-use the common args and not have to define them twice (so the llm for example which is the same for the 2)
Ideally i would have something like this:

Plain Text
base = QueryEngine(...shared_args)

specific1 = QueryEngine(base_query_engine=base, ...specific_args1)
specific2 = QueryEngine(base_query_engine=base, ...specific_args2)

Is a pattern like this possible?
you could just put it in a dict

Plain Text
shared_args = {"similarity_top_k": 4, "filters" filters}

specific1 = QueryEngine(..., **shared_args)
specific2 = QueryEngine(..., **shared_args)
Thanks good to know, not a python dev πŸ˜›
haha no worries!
Just got this error:

worker-1 | [2024-04-18 15:55:25,266: INFO/ForkPoolWorker-7] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
worker-1 | [2024-04-18 15:55:25,267: WARNING/ForkPoolWorker-7] Retrying llama_index.llms.openai.base.OpenAI._chat in 7.560941276281184 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 16441 tokens (16176 in the messages, 265 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}.


i thought the query engine automatically chunks API calls so this cant happen. What is going wrong here? :/

Plain Text
index = initialize_index(model)

base_args = {"llm": get_llm(model_name=model), "filters": get_document_filters(uuid)}

outline_query_engine = index.as_query_engine(
    output_cls=PresentationOutlineV1, similarity_top_k=15, **base_args,
)

outline = outline_query_engine.query(outline_query_str).response.dict()
seems like a small token counting issue (its just barely over too :PSadge: )
To change this, I might... artifically lower the context window size. Except the OpenAI class doesn't let you do this (without a small PR), so you can only modify it in the global settings
Good idea. Do you think this is an issue with the llama index implementation itself kind of ignoring the token count of the user query?
Its not ignoring it, its that token counting can get very tricky (especially when using the output_cls option)
Add a reply
Sign up and join the conversation on Discord