I am using LlamaIndex to summarize a list index with a bunch of different final prompts. The text I pass in might be 60,000 tokens long and the only thing I'm changing is the prompt. "Generate three titles from the summary." "Generate a blog post from the summary of the text".
Is there any way to save money instead of having to parse all of the documents over and over again for the different types of summaries, or, maybe store embeddings instead?
I think one idea might be to use a single prompt to generate a generic summary. Then from there, use your custom prompts and generate your different outputs from a common generic summary π
This way,. you only read all 60,000 tokens once, then from there you re-use the generic summary to build different outputs
This would work really well actually with tree_summarize for the generic summary, and then a pydantic program for getting structured outputs out of the summary
Yea like it would just be "Summarize the text. Be sure to include key details."
Once you have that summary, you could pass it through other simple API calls or pydantic programs, to re-structure and re-write the summary however you need