I'm experimenting with a RAG solution for indexing and retrieving code snippets. The concept involves indexing a git repository and updating it on new changes, using a local vector store that is initiated when the repository is cloned. This implementation is quite experimental at the moment, and I'm focusing on benchmark the chunking solution to the swe-bench benchmark dataset (current results here
https://github.com/aorwall/moatless-tools/blob/main/benchmark/reports/princeton-nlp-SWE-bench-devin/README.md) .
As I just want to store vectors on disk I tried the SimpleVectorStore but found it quite slow, and to set up a fully-featured vector store seemed to introduce unnecessary overhead. So I ended up implementing some kind of hybrid solution of SimpleVectorStore and FAISSVectorStore (
https://github.com/aorwall/moatless-tools/blob/main/moatless/store/simple_faiss.py) . But I guess there are better solutions to this?