Find answers from the community

Updated last year

This may be a dumb question....

At a glance

This may be a dumb question....

Is ColBERTv2 able to use vector stores for storage or do you need to use the data structured provided and load it into RAM?

I obviously know very little about ColBERT

7 comments

LLogan M

colbert actually has it's own storage format. We are kind of stuck with what they implemented unless we wanted to reimplement the entire algorithm (I think, anyways, haven't looked too much into that)

However, traditional vector dbs use "dense" vectors, while colbert uses "sparse" vectors. The two concepts are a little different. A few vector dbs support both sparse and dense vectors (qdrant and pinecone)

WWizboar

So would it be possible to potentiall store both the SPLADE and the COLBERT vector as two named vectors in Qdrant?

WWizboar

COLBERT looks like a better use of time / money then fine tuning a dense vector like FlagEmbeddings to my dataset. But I don't want a headache...

LLogan M

yea it should be able to do that. We recently added an example of using splade with qdrant

LLogan M

https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid.html

jjimmy6dof

Did you look into RAGatouille ? It is a full pipeline for ColBERT/v2 and all the other sparse BERTs ..... they have some connector code to langchain and integrated with Vespa db. They have a link to the LlamaIndex codebase to in their readme (lots of updates on the guy's twitter) .......... https://github.com/bclavie/RAGatouille

WWizboar

@jimmy6dof thanks. I will dive deeper into it. I have an existing qdrant cluster with 100m+ vectors so I need something to work with what I have going.

Add a reply