Find answers from the community

Updated 2 years ago

I need to scan over 100 of word

I need to scan over 100 of word documents placed in a folder and querying over it. What would be the best practice to do it ? I think I can still do it using the basic code but not sure whether it is advisable ? Do I need to go for any other kind of indexes ? Do I need to persist the vector in some vector database and ask queries on top of it ?

documents = SimpleDirectoryReader('pdf').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("Explain the concept of encoders and decoders")
print(response)
L
1 comment
I would start with a single index, like your example, and see how it works from there. A lot of it depends on how long the documents are, how varied the topics are

One quick thing that will help is to set a little higher top k (the default is 2)

query_engine = index.as_query_engine(similarity_top_k=3)
Add a reply
Sign up and join the conversation on Discord