Find answers from the community

Updated last year

Datasets

I'm wondering if there are any metrics or benchmarks on how the size of a vector DB relates to the likelihood of appropriate retrieval
L
T
3 comments
Tbh it feels like it will be different for every set of data.

I would generate a retrieval benchmark dataset with llamaindex, and keep track of when retrieval accuracy starts to suffer as you add more data?
At least that's one approach
Yeah it's super difficult to do general benchmarks for this stuff, the results can vary quite a lot with different datasets. Also the Vector DB provider will have an effect etc.
Add a reply
Sign up and join the conversation on Discord