We are using llama index to search the

At a glance

We are using llama index to search the pdf documents. For vector databases we are using milvus. When the index was small it was giving very good results. Now the index has 300K Documents. Now the results are not good. How to improve the results quality?

4 comments

LLogan M

putting 300k documents into a vector db and using simple top-k retrieval is not going to cut it.

You likely need hybrid search (I don't the milvus supports this) or you need to organize your data into several indexes (by topic? Category?), and use a router, retriever router, or sub-question engine on top of the data

Likely sentence window retrieval might help as well, if the data is easily split by sentence

Of course, all of the requires re-embedding/re-building indexes 😅 Planning for indexes of this size is key

NNithin

@Logan M thanks. Will try multiple index and retriever router . What is the recommended size for a vector database collection?

LLogan M

It's hard to say tbh, it really depends on your data I think

LLogan M

The better you can sort things into categories, the better I think

Add a reply

Find answers from the community

We are using llama index to search the