I spoke to the promptlayer author, he's not totally sure either 😛 From my debugging I'm pretty sure this is to be fixed within the llama_index integration
I could give it a shot, however that's the thing - llama_index is making calls of type CHUNKED which I have too little knowledge to understand where it's coming from - afaik I'm just making simple LLM calls with that pseudocode I just showed
The response synthesizer by default is "compact" -- this means it combines all retrieved nodes into one chunk, and then splits again, so that each LLM input as big as possible (this reduces overall LLM calls)