Find answers from the community

Updated last year

We are using llama index to search the

At a glance
We are using llama index to search the pdf documents. For vector databases we are using milvus. When the index was small it was giving very good results. Now the index has 300K Documents. Now the results are not good. How to improve the results quality?
L
N
4 comments
putting 300k documents into a vector db and using simple top-k retrieval is not going to cut it.

You likely need hybrid search (I don't the milvus supports this) or you need to organize your data into several indexes (by topic? Category?), and use a router, retriever router, or sub-question engine on top of the data

Likely sentence window retrieval might help as well, if the data is easily split by sentence

Of course, all of the requires re-embedding/re-building indexes πŸ˜… Planning for indexes of this size is key
@Logan M thanks. Will try multiple index and retriever router . What is the recommended size for a vector database collection?
It's hard to say tbh, it really depends on your data I think
The better you can sort things into categories, the better I think
Add a reply
Sign up and join the conversation on Discord