We are using llama index to search the pdf documents. For vector databases we are using milvus. When the index was small it was giving very good results. Now the index has 300K Documents. Now the results are not good. How to improve the results quality?
putting 300k documents into a vector db and using simple top-k retrieval is not going to cut it.
You likely need hybrid search (I don't the milvus supports this) or you need to organize your data into several indexes (by topic? Category?), and use a router, retriever router, or sub-question engine on top of the data
Likely sentence window retrieval might help as well, if the data is easily split by sentence
Of course, all of the requires re-embedding/re-building indexes π Planning for indexes of this size is key