vecto_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine, description=( "Useful for retrieving specific context related to the netwrok paper." ) )
---------------------- The LlamaIndex documentation suggests that as the number of documents scales up, the standard RAG stack (top-k retrieval + basic text splitting) doesn't perform well. In such a scenario, you may want to use structured information to help with more precise retrieval. This could be particularly useful if you have a question that's only relevant to a subset of your PDFs. Using structured information can ensure that the relevant PDFs get returned beyond raw embedding similarity with chunks [source ()].
In the case of having multiple PDFs, you could consider creating a separate tool for each PDF as shown in the example from the LlamaIndex GitHub repository. Here, a dictionary of tools is created where each tool is associated with a specific document [source ()].
However, you're correct in noting that this approach may not scale well for a large number of PDFs. One possible solution could be to group related PDFs together and create a tool for each group, rather than for each individual PDF. This would reduce the number of tools you need to manage.
Remember, the best approach will depend on your specific use case and the nature of your data. You may need to experiment with different strategies to find what works best for you. ---------------------- Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: ----------------------