Find answers from the community

Updated 12 months ago

This may be a dumb question....

This may be a dumb question....

Is ColBERTv2 able to use vector stores for storage or do you need to use the data structured provided and load it into RAM?

I obviously know very little about ColBERT
L
W
j
7 comments
colbert actually has it's own storage format. We are kind of stuck with what they implemented unless we wanted to reimplement the entire algorithm (I think, anyways, haven't looked too much into that)

However, traditional vector dbs use "dense" vectors, while colbert uses "sparse" vectors. The two concepts are a little different. A few vector dbs support both sparse and dense vectors (qdrant and pinecone)
So would it be possible to potentiall store both the SPLADE and the COLBERT vector as two named vectors in Qdrant?
COLBERT looks like a better use of time / money then fine tuning a dense vector like FlagEmbeddings to my dataset. But I don't want a headache...
yea it should be able to do that. We recently added an example of using splade with qdrant
Did you look into RAGatouille ? It is a full pipeline for ColBERT/v2 and all the other sparse BERTs ..... they have some connector code to langchain and integrated with Vespa db. They have a link to the LlamaIndex codebase to in their readme (lots of updates on the guy's twitter) .......... https://github.com/bclavie/RAGatouille
@jimmy6dof thanks. I will dive deeper into it. I have an existing qdrant cluster with 100m+ vectors so I need something to work with what I have going.
Add a reply
Sign up and join the conversation on Discord