The community member is a newbie to LlamaIndex and has created a basic chatbot that can reply to questions from a series of documents. They are using 1 index and 1 query engine, but have learned that it's possible to have 1 index for each PDF input document (and one query engine). They have around 200 large, unrelated PDF documents to be used in the chat. The community member is unsure whether it would be better to have 1 giant index with metadata containing the project name, or 200 separate indexes and query engines under a "main" query engine.
In the comments, another community member suggests creating separate indexes if the documents are completely unrelated, as this would increase the accuracy compared to combining them together. The original poster was concerned about potential limitations on the number of indexes and query engines that can be created, but was reassured that there are no such limitations.
@kapa.ai Hello there! I am a newbie to llama index. I created a first basic chatbot that can reply to questions from a series of document. Iam using 1 index and 1 query engine (and retriever). However, from the training material they show that is possible also to have 1 index for each PDf input document (and one query engine). I have roughly 200s large PDFs to be injected in the chat. They are all unrelated (each PDf is a different project). What would be better: 1 giant index (with the metadata containing the name of the project) or 200s indexes and 200s query engine under a "main" query engine?
I would suggest create separate index if the docs are completely non-related. This would increase the accuracy in comparison to combining them together.