hello there, I am currently facing a

hello there, I am currently facing a road block with OpenAI llm output token limit. what i'm doing now is basically retrieving line item from tables in documents like invoice in a structured output. the format of the structured output is a list of dictionaries. So each dictionary is 1 line item and within the dictionary, keys are the table headers and values are the row values.

the road block comes when i'm retrieving from a very large table using a single query call to the LLM (i'm using recursive query engine). for example, the table has 50 line items and the token limit is reached when im retrieving up till around the 30th line item. In LlamaIndex, I would like to know if currently there is any way for query engine to "stash" the first response, and send another api call or however many times I want till all line items are retrieved? Or do I have to build a for loop myself to pass in the first response as a prompt for the second query to retrieve the remaining line items?

Find answers from the community

hello there, I am currently facing a