I need to scan over 100 of word documents placed in a folder and querying over it. What would be the best practice to do it ? I think I can still do it using the basic code but not sure whether it is advisable ? Do I need to go for any other kind of indexes ? Do I need to persist the vector in some vector database and ask queries on top of it ?
documents = SimpleDirectoryReader('pdf').load_data() index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine() response = query_engine.query("Explain the concept of encoders and decoders") print(response)
I would start with a single index, like your example, and see how it works from there. A lot of it depends on how long the documents are, how varied the topics are
One quick thing that will help is to set a little higher top k (the default is 2)