Retrieval

At a glance

The post discusses an issue where the target document is returned after 100 items when the PDF includes 100 pages, but the community members can get the result in the top 20 when using a few pages. The comments suggest that the retrieval in llamaindex uses a similarity algorithm, so it should not have an issue even if the required content is present after 100 items. One community member suggests setting the similarity_top_k value to 20 to find the top 20 results based on the query. Another community member mentions the need to optimize the query or do query transformation to ensure the target answer appears in the top 20 results. The community members also discuss how to implement retrieval and query engine together, and how to fetch the retrieved nodes used for generating the response.

hhabout632

if i just use a few pages, i can get the result in just top 20, but when the pdf including 100 pages, the target doc will be returned after 100 items

10 comments

WWhiteFang_Jr

Retrieval in llamaindex uses similarity algorithm to find the most nearest content from your dataset for your query.

So it would not have any issue even if the required content is present after 100 items

hhabout632

yes you are correct

hhabout632

is there any doc about optimize retrieval results, my expectation is top 20

WWhiteFang_Jr

If you want to find the top 20 result based on your query. Just need to set the topK value to 20

query_engine = index.as_query_engine( similarity_top_k=20)

hhabout632

ok i think i have to optimize my query or do query tranformation , so the target answer will show in top 20

SSleigh65

How to implement retrieval and query engine together

WWhiteFang_Jr

If you define your query_engine and ask the query. Along with the LLM response, you also get the retrieved nodes used for generating the response.

Plain Text

response = query_engine.query("ask your query here")
# Retrived nodes can be fetched as 
print(response.source_nodes)

SSleigh65

I’m using create-llama package and have edited index.py for local LLM and embedding usage.

SSleigh65

How many arguments does query_engine take ? Can top k argument included along with tree_summarize

WWhiteFang_Jr

Yes

Add a reply

Find answers from the community

Retrieval