Find answers from the community

Updated 4 months ago

It makes 5 llm calls total or 5 llm

At a glance
It makes 5 llm calls total, or 5 llm calls to llama index? What do your settings/indexes look like?
g
L
8 comments
@Logan M I think 5-7 LLM calls to openai.com
In the logs for my latest agent_chain.run(), I see there are 7 lines of
Plain Text
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/chat/completions HTTP/1.1" 200 None


This agent_chain.run() consists of 7 LLM calls to openai made up of
  • 1 QA prompt
  • 6 refine prompts
Here are my settings/indexes:
Plain Text
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    embed_model=OpenAIEmbedding(), llm_predictor=LLMPredictor(llm=llm)
)
index = GPTSimpleVectorIndex.load_from_disk("index.json", service_context=service_context)
index_configs = [
    IndexToolConfig(
        index=index,
        name="Vector Index",
        description="No description",
        index_query_kwargs={
            "similarity_top_k": 3,
            "text_qa_template": qa_prompt,
            "refine_template": refine_prompt,
        },
        tool_kwargs={"return_direct": True},
    )
]
toolkit = LlamaToolkit(
    index_configs=index_configs,
)
Found your previous comment on another thread about adding max iterations and early stopping parameters. Do you have an example of how they can be used?
https://discord.com/channels/1059199217496772688/1093833050854543380/1093915395372630046
So, at a minimum it will make 3 llm calls (top k is 3)

However, it looks like the prompt + query + context (+ previous answer) don't fit in the max input size, so it breaks it into chunks

Try setting chunk_size_limit=3000 in the service context

If you shrink the chunk size even smaller (which may change the answers you get/embeddings), you can also set "response_mode": "compact" in the query kwargs, which willb stuff as much text as possible in each llm call rather than one call per top k
I was just spitballing those features haha but I may have something implemented in the coming days πŸ˜€
Decreasing to chunk_size_limit=500 for service_context and using "response_mode": "compact" still results in 7 LLM calls.
Also tried several chunk_size_limitvalues between 500 and 3000, no luck there.

Plain Text
index_configs = [
    IndexToolConfig(
        index=index,
        name="Vector Index",
        description="No description",
        index_query_kwargs={
            "similarity_top_k": 3,
            "text_qa_template": qa_prompt,
            "refine_template": refine_prompt,
            "response_mode": "compact",
        },
        tool_kwargs={"return_direct": True},
    )
]

Is it possible that the index needs to be reconstructed with a service_context that has a lower chunk_size_limit?
Oh yes! It does need to be reconstructed, whoops πŸ˜…
Ohhh... what value of chunk_size_limit do you suggest the index be rebuilt using?
If index is reconstructed with chunk_size_limit=500, does it mean the IndexToolConfig using "similarity_top_k": 3 will retrieve 3 chunks of size 500?
Yes that's what it means πŸ‘ I would suggest maybe a bit bigger, like 1000. Sometimes small chunks make the answer harder to find

But, it may take some experimenting with your data πŸ˜…
Add a reply
Sign up and join the conversation on Discord