Unexpected behaviour when no match is found using the llamacloudindex
Unexpected behaviour when no match is found using the llamacloudindex
At a glance
The community member is using the LlamaCloudIndex with similarity_top_k=3 and noticed that when no match is found, the source_nodes contain only one chunk, which is the entire document. They are surprised by this behavior, as they expected the top similar nodes to be returned even if they are not very similar.
In the comments, another community member asks if the community member has added any postprocessing step like Similarity Postprocessing, to which the community member replies "no". The community member then provides their code, which only includes initializing the LlamaCloudIndex and using the as_query_engine and aquery methods.
Another community member, Logan M, is tagged and asked to take a look at the issue. A final comment suggests that the LlamaCloud retriever uses an "auto" mode, where it automatically decides to do either top-k based chunk retrieval or retrieve entire files based on metadata.
I'm using the LlamaCloudIndex with similarity_top_k=3, I noticed that when no match is found the source_nodes contain one chunk only and that chunk is the whole document. Is that expected behaviour? I'm surprised as I thought there would always be source nodes as we're looking at top similarities, i.e. it would return the most similar nodes even though they might not be similar at all.
I'm pretty sure by defult, the llama cloud retriever uses "auto" mode, where it automatically decides to do either top-k based chunk retrieval, or retrieving entire files based on metadata