The community member is indexing a large dataset of 1600 product reviews and is facing an issue where the summary only considers a few of the reviews. The index contains 229 document IDs, each containing around 7 reviews. The community member suspects that the LlamaIndex is stopping after finding relevant information in the first few documents. They are asking if there are settings in LlamaIndex to ensure all documents are reviewed for relevancy before responding.
In the comments, another community member suggests the community member is using VectorStoreIndex and that by default, it only returns the top-k nodes. They recommend controlling this by using index.as_query_engine(similarity_top_k=2). Another community member suggests the community member may need a decision layer to determine if the user is asking for a summary or a pointed question.
There is no explicitly marked answer in the comments.
We're indexing large data sets that we save as text files. The one in question is a set of 1600 product reviews that are collected from various product review sites. The issue seems to be that if I put them all in one index (which we store for later recall), and ask for a summary of the reviews, it only seems to consider a few of the reviews in the analysis. In this case there are 229 doc IDs in the docstore file of the index. All the docs here contain similar information of approximately 7 reviews. Is this because when I query the index and llama index sees relevant information for my query response in the first couple docs, it just stops there?
As a follow up, are there settings in LlamaIndex I should use to ensure that all docs are reviewed for relevancy prior to responding? Thoughts and help much appreciated!
I meant for that and all the other variables such as Temperature and Top P if they're not configured in the .settings call. Shouldn't there be a listing somewhere?