The community member is interested in generating summaries from large documents (20-30+ pages) that follow specific guidelines, such as including content that answers guidelineA, guidelineB, and guidelineC. They propose using a RAG (Retrieval-Augmented Generation) system to achieve this, by indexing the documents into chunks and querying them using the guidelines as queries, then extracting the relevant content using a custom prompt.
In the comments, other community members suggest trying a basic summarization approach using tools like LlamaIndex, or indexing the documents and using the DocSummary feature. The community member acknowledges these suggestions and indicates they will need to dive deeper into the summarization feature to see if they can provide guidelines to generate the desired summary.
Hi everyone, I would like to generate summaries from "big" documents (20, 30+ pages). The complexity is that the summaries should follow specific guideline (not in term of format but content). For instance, the summary should include content answering guidelineA, guidelineB and guidelineC. Could I use a RAG system to generate such a summary ? My idea is to index my big documents into chunks then query them using guidelineA, guidelineB and guidelineC as queries and then proceed to extract the content for each queries using a custom prompt "according to this context, answer the request : {guidelineX}".. Do you think it is a good approach ? The tweek is to use the mainstream Q&A pipeline of RAG system but without asking questions (request instead).