Find answers from the community

Updated last year

Datasets

At a glance

I'm wondering if there are any metrics or benchmarks on how the size of a vector DB relates to the likelihood of appropriate retrieval

3 comments

LLogan M

Tbh it feels like it will be different for every set of data.

I would generate a retrieval benchmark dataset with llamaindex, and keep track of when retrieval accuracy starts to suffer as you add more data?

LLogan M

At least that's one approach

TTeemu

Yeah it's super difficult to do general benchmarks for this stuff, the results can vary quite a lot with different datasets. Also the Vector DB provider will have an effect etc.

Add a reply