Hi everyone.

At a glance

Hi everyone.

I'm encountering an issue with querying when training a docx file containing a description of the content I need to search for. However, when I query, I can't retrieve the desired content. Instead, the result I receive is from a different part within the file.

Please let me know how to handle this.
Thank you all very much.

9 comments

WWhiteFang_Jr

You can check the nodes it is fetching based on your query and then check further from there:

Plain Text

# to check the nodes it used to generate answer:
print(response.source_nodes)

# If you storing index locally, check all the nodes like this:
print(index.docstore.docs)

BBrent

Thank @WhiteFang_Jr .

However, one issue is that I found the query_engine.query does not return the correct node with similar keywords in the docx file.

Not all, but there are a few places where I cannot query out.

index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(text_qa_template=qa_prompt_tmpl, streaming=True, service_context=service_context)

response = query_engine.query(f"{input_text}")
print(response.source_nodes)

WWhiteFang_Jr

Okay so you will have to go through the nodes and check how you can improve your data for for more better data.

You could check this: https://gradient.ai/blog/rag-101-for-enterprise on how to improve your RAG experience

BBrent

Thank you. I will explore those methods further. But is there any similar section in the llamaindex documentation?

WWhiteFang_Jr

Not in the documentation But i think there was a blog in LlamaIndex blog. Let me check

WWhiteFang_Jr

https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5

This one

BBrent

I am using this piece of code with a modified chunk_size, and I'm encountering a situation that I can't find an explanation for:

query_engine = RetrieverQueryEngine.from_args(retriever, text_qa_template=prompt_tmpl, service_context=service_context)
response = query_engine.query(input_text)

In my data, it talks about a musician named A. It also mentions that 'Musician A once acted in movie B'. However, when I query 'Has A ever acted in any movies?', I receive the answer 'There is no data mentioning that Mr. A has ever acted in movies'. But when I ask 'Has he ever acted in any movies?', I get back the result 'Mr. A has acted in a movie called B'. I am very curious why there are two different results like this.

WWhiteFang_Jr

Check the nodes which are returned in the resposne and verify if correct nodes are picked for all these scenario or not.

You can print nodes like this: print(response.source_nodes)

BBrent

I am taking 6 nodes, then there are 2 different nodes with the 2 questions above.

retriever = QueryFusionRetriever(
indexes,
similarity_top_k=6,
num_queries=3,
mode="reciprocal_rerank",
use_async=True,
verbose=True,
)

///////////////////////////////////////////
There is one more issue. When I describe at length, the result cannot capture the entire description but only gets the beginning. How can I make it capture everything

Add a reply

Find answers from the community

Hi everyone.