@Logan M Hey Logan, thanks for your work

At a glance

@Logan M Hey Logan, thanks for your work on RAPTOR. I have two questions as I try to deploy this for my product:
How would I save RAPTOR into a hosted vector DB like pinecone? Do I basically just load the vector DB and say in the raptor pack: pack = RaptorPack(..., vector_store=pinecone_vector_store)?
How do I add more documents to an already existing RAPTOR pack? Do I simply load the vector store and then fill the document parameter with more documents?

15 comments

LLogan M

Insertion isn't really implemented actually. The idea would be to insert to the bottom level cluster that best macthes the new chunk, and then manually re-trigger a reclustering/re-summarizing after X insertions

BBioHacker

Well by insertions I mean a simple version where we do the clustering only based on the newly provided documents and just save them in the vector DB.

BBioHacker

My insertions would be just entirely new docs so we don’t need to re-cluster or re-summarize previously saved data. Is there a way to set this up? Perhaps re-run the RAPTOR pack with new data in the document parameter?

BBioHacker

What happens if we just run
pack = RaptorPack(documents=new_docs, vector_store=pinecone_vector_store)?

Wouldn’t it just add all the new nodes into the vector store? And then we can retrieve using RaptorRetriver?

LLogan M

Pretty sure pack = RaptorPack(documents=new_docs, vector_store=pinecone_vector_store) will run clustering/summarization on those documents. It would be disconnected from any existing documents

LLogan M

You would need to re-summarize/re-cluster at some point if you insert enough, because your data will start to drift essentially (i.e. the summaries of clusters won't be representitive)

LLogan M

Insertion without clustering/summarization would mean inserting into the closet cluster at the bottom of the hiearchy (which isn't implemented either)

BBioHacker

@Logan M thanks for the explanation. So essentially I could add more document into my RAPTOR index saved in PineCone by running that command as long as each of my document is topically diverse enough?

BBioHacker

Or perhaps after every X number of documents, I can delete the index and rerun the whole RAPTOR index with all the new and old documents combined? ?

LLogan M

I don't think the logic for adding more documents to an existing raptor index is there, so you'd have to write that first 😅

bbeaverTango

@Logan M is the insertion support still not there?

LLogan M

No, it's a llamapack, so it's mostly just an example/proof of concept. Probably not going to get maintained

LLogan M

Been thinking about adding it to core, but lower priority at the moment

LLogan M

I welcome contributions though

bbeaverTango

@Logan M Thanks. Let me see if I can work on raising a PR for this

Add a reply

Find answers from the community

@Logan M Hey Logan, thanks for your work