I am trying to implement a recommendation from
https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/dev_practices/production_rag.html to "decouple chunks used for retrieval from chunks used for synthesis" by
1.) generating a summary for each node
2.) storing an embedding of a summary along with the original text corresponding to the summary
3.) using the summary embedding during the retrieval step
4.) using the original text during the synthesis step
I was hoping that using DocumentSummaryIndex as recommended in the above-linked documentation will be the simplest way to do that, however, I noticed that this index persists summaries and their embeddings into DocStore, not VectorStore (in my case, MongoDB). I am wondering about the performance of this in production scenarios (with tens of thousands of chunks). I'd like to find a solution with llama-index where embeddings would be stored in a vector database.
so far, what I've managed to come up with is to use plain old VectorStoreIndex and a custom subclass of BaseEmbedding, in which I would call LLM to generate a summary of a node and instead of storing embeddings of the node, I would store embeddings of a summary. this feels hackish to me, is there a better approach somebody can think? ideally, I am looking for something that would enable me to preserve also the summaries in their textual form, not only as embeddings