TreeSummarize is just combining all texts

ppikachu8887867

I have a question about TreeSummarize class.

summarizer.summarize(prompt, [texts])

If combined length of all texts is less than context window, it seems not performing tree based summarization, but rather just stacking all texts and making one call only?

What If I want to force it to summarize each text separately and perform tree based summarization?

For example, I have a document of size 20k tokens. And I declare a function to first split that text to chunks of size ~4k.

So I will have a list of size 5 (20k/4k).

And I see that my tree-summarizer performs summarization in one api call (basically sending everything at once). Why it is not performing tree-summarization?

8 comments

ppikachu8887867

by tree based summarization I mean:

chunk1+chunk2=summary1
chunk3+chunk4=summary2
summary1+summary2=summary3

etc.

So when combined length of all texts in a list is less than a model’s context window it seems not doing it, but just stacking all texts together and making one API call.

ppikachu8887867

or is it meant to work so?

LLogan M

Yea it compacts the chunks to save LLM calls -- I dont actually think there is an option to turn that off

LLogan M

(I would expect a single LLM call to perform better though? And much faster)

ppikachu8887867

@Logan M It is faster, yes, but what about “lost in the middle” problem?

LLogan M

ngl I think thats less of an issue with newer LLMs these days 👀 If you want, you could try using Refine to sequentially create a rolling summary, and see how that looks

vvietphuon

one question, did you try to trace the openai call usage using open-source like LangFuse? I think the current Tree Summarize callback manager is not working as I can't track down any openai call the Tree Summarize object made, they also have 1-2 "#TODO" in their newest version code

ppikachu8887867

@vietphuon no, I did not trace it. But when I call summarizer.summarize I see some stdouts. Something like: 1 chunk found or something like that. So I assume that there was only one API call, but can't guarantee

Add a reply

Find answers from the community

TreeSummarize is just combining all texts