Find answers from the community

Updated 3 months ago

Documents

Could somebody please advice me how to retrieve more comprehensive results from multiple documents using GPTVectorStoreIndex. I have a collection of scientific documents related to a specific plant, extracted from PDFs and stored in JSON and CSV formats. I'm using the following code to load all these documents:
Plain Text
for dir_path in all_dirs:
    dir_reader = SimpleDirectoryReader(dir_path, file_extractor={
        ".csv": PandasCSVReader(),
        ".json": JSONReader()
    })
    docs.extend(dir_reader.load_data())


After loading, I create an index and query it:


Plain Text
index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
query_engine = load_index_from_storage(storage_context).as_query_engine(similarity_top_k=4)
response = query_engine.query(input_text)


The issue is that the query always returns a single response from just one document that best matches my query. However, I know that other documents contain more relevant information about the plant. How can I modify my approach to retrieve more comprehensive results from all relevant documents in my index?
L
m
J
3 comments
Instead of one index, you could separate into multiple according to categories or topics, or even individual documents.

Then using a sub question query engine or a router query engine on top to route queries to the proper data
Thank you Logan, I will try this approach. The thing is that all my documents are about the same topic but published in different times by different people and I wanted to put together what each author had to say about the particular issue and cite those authors.
Hello, mariamaak,
I'm curious if you've made any headway in resolving your issue. I'm facing a similar situation where many of my documents revolve around a common subject, but each presents some distinct nuances. If you have any insights or solutions you'd be willing to share, it would be greatly appreciated. Thank you in advance.
Add a reply
Sign up and join the conversation on Discord