Find answers from the community

Updated 3 months ago

Vector store

I'm experimenting with a RAG solution for indexing and retrieving code snippets. The concept involves indexing a git repository and updating it on new changes, using a local vector store that is initiated when the repository is cloned. This implementation is quite experimental at the moment, and I'm focusing on benchmark the chunking solution to the swe-bench benchmark dataset (current results here https://github.com/aorwall/moatless-tools/blob/main/benchmark/reports/princeton-nlp-SWE-bench-devin/README.md) .

As I just want to store vectors on disk I tried the SimpleVectorStore but found it quite slow, and to set up a fully-featured vector store seemed to introduce unnecessary overhead. So I ended up implementing some kind of hybrid solution of SimpleVectorStore and FAISSVectorStore (https://github.com/aorwall/moatless-tools/blob/main/moatless/store/simple_faiss.py) . But I guess there are better solutions to this?
L
a
4 comments
We actually already have a faiss integration.

But tbh any other vector store doesn't make the setup too bad.

Tbh if you are dealing with a lot of data, you probably do want a dedicated server for your vector db
I'm using a IngestionPipeline with DocstoreStrategy.UPSERTS_AND_DELETE to be able to remove old nodes from the DocStore and VectorStore. The Faiss VectorStore implementation doesn't support deletions which was the reason I did my hybrid solution. I tried the Chroma VectorStore but got issues with deletions there as well. I guess next up to try will be Qdrant then πŸ˜…
what is the issue with chroma? πŸ€”
When the ingestion pipeline requested to delete nodes the Chroma store logged warnings like "Add of existing embedding ID: 1", and the vectors where never deleted. But I only ran as a in-memory store with persistance, so not sure how stable it is.
Add a reply
Sign up and join the conversation on Discord