I'm trying to create summaries based off

At a glance

I'm trying to create summaries based off of a 500 page report and the documentation is a little lacking for this use case

10 comments

aandrei

Hey, can you elaborate on what complex summarizations means in this context? I guess 500 pages is quite long to fit into a single context window.

aandrei

Have you considered perhaps doing a per section summarization first, and then maybe using that to get to a top-level summarization?

bben25635

so that's exactly what I was thinking. It's a series of company call transcripts. I was going to chunk the transcripts so that they're conceptually self contained and then perform summarisation on the context window adhering chunks. Then do a summary of summaries for the final report, essentially a map reduce. Just wondering if there is a better way.

aandrei

Okay that makes sense. A few more clarifying questions. Are you just trying to create summary documents and thats it, or are you also trying to do some QA over the summarie documents?

aandrei

For example, if you build QA over your documents, then you'd just need a structured knowledge base:

Build your vector store with metadata that corresponds to each call transcript
You can retrieve based on that metadata
Then in your query ask the LLM to summarize the call transcript

aandrei

To get the top-level summary, you can:

Extract summaries of the more granular call transcripts
Index these summary documents using a SummaryIndex which will use all the documents
And query the LLM to produce a final summary

Though I wonder if you'll be able to fit all the summaries within the context window. If not, then you should look at node postprocessors to compress the prompt. LongLLMLingua can be used for this.

bben25635

Ok this is really helpful, thank you!

I will be doing QA over the documents as well but I feel that's much more in line with standard RAG techniques and am pretty confident I can get it working. What I'm finding more challenging is building a system that can extract common themes, figures and a general overview from a 500 page set of transcripts. It should be as if an intern went through and condensed the document for the user.

aandrei

Yeah, it sounds like you're currently more concerned with prompt engineering to get a sufficient summary rather than retrieval. In this case, I'd be working with LLMs alone and modifying the prompt to get a good enough summary.

bben25635

makes sense, thanks for your help. You guys have built a great tool

aandrei

Cool, my pleasure -- thanks for the feedback!

Add a reply

Find answers from the community

I'm trying to create summaries based off