@Logan M Hey Logan, thanks for your work on RAPTOR. I have two questions as I try to deploy this for my product: How would I save RAPTOR into a hosted vector DB like pinecone? Do I basically just load the vector DB and say in the raptor pack: pack = RaptorPack(..., vector_store=pinecone_vector_store)? How do I add more documents to an already existing RAPTOR pack? Do I simply load the vector store and then fill the document parameter with more documents?
Insertion isn't really implemented actually. The idea would be to insert to the bottom level cluster that best macthes the new chunk, and then manually re-trigger a reclustering/re-summarizing after X insertions
My insertions would be just entirely new docs so we don’t need to re-cluster or re-summarize previously saved data. Is there a way to set this up? Perhaps re-run the RAPTOR pack with new data in the document parameter?
Pretty sure pack = RaptorPack(documents=new_docs, vector_store=pinecone_vector_store) will run clustering/summarization on those documents. It would be disconnected from any existing documents
You would need to re-summarize/re-cluster at some point if you insert enough, because your data will start to drift essentially (i.e. the summaries of clusters won't be representitive)
@Logan M thanks for the explanation. So essentially I could add more document into my RAPTOR index saved in PineCone by running that command as long as each of my document is topically diverse enough?