For pdf, Page label and filename is present in the source nodes of the response object.
Chat memory system helps in maintaining the conversation memory. This helps in maintaining the context so users can asks counter questions.
There are different chat memory approaches present in LlamaIndex ( Chat memory buffer, Chat memory summary )which can be used as per your use case.
i tried multiple time to access to sources nodes but i didnt success i m using this aproach on the reteiving response = Settings.llm.complete(query_with_context)
response_text = str(response) # Convert the response object to string source_documents = [{"filename": doc['filename'], "text": doc['text']} for doc in response_docs]
response_data = {
'response': response_text,
'sources': source_documents
}
return jsonify(response_data)
You are interacting with the llm directly here. That is why there are no source nodes.
Source nodes will come when you make a RAG application
In this approach, even the page label wont come
i didnt understand u i thought i was doing a rag application what are the modification that i need to do to access to nodes
RAG contains various section:
- First one being, You create your vectors.
- Second based on your query, certain nodes are pulled from your vectors data set.
- You use those nodes to create final answers.
Now what I'm seeing here is you are interacting with the llm directly here
Settings.llm.complete(query_with_context)
i thinks i misunderstand the documentation im confused now can u tell me what to do this is the vectro part of my code # Create the index from documents
documents = list(chunks_collection.find({}))
llama_documents = [Document(text=doc['text'], extra_info={"filename": doc["filename"]}) for doc in documents]
index = VectorStoreIndex.from_documents(llama_documents, storage_context=storage_context, embed_model=Settings.embed_model, service_context=service_context)
index.set_index_id("my_index")
index.storage_context.persist(persist_dir="./storage")
Load or create indices
index = load_index_from_storage(storage_context, index_id="my_index")
retriever = index.as_retriever(similarity_top_k=10)
Create an instance of your custom engine
custom_prompt_template = MongoDBContextPrompt(template="{context}\n\n{query}")
query_engine = CustomRetrieverQueryEngine(retriever=retriever, llm=Settings.llm, prompt_template=custom_prompt_template)
what do i need to change in the query funtion ? @WhiteFang_JrYou have the query_engine
, use this to ask the query
This will return nodes which will contain the metadata having page_label in it
response = query_engine.query(Your query here)
thank u i have two more question how can i use this nodes in my code and what can i do if the code is not answering right like he gives a part of the answer
Nodes contain information on metadata like page label from which text is extracted.
Now you can also show the source text that has been used to generate the final answer.
For your second question: if your code is not giving the correct answer. I would suggest you try to identify the root cause first.
For example root cause could be that prompt is not good then you should try with different prompts
@WhiteFang_Jr can u tell me how to modify this code to get the sources response_text = query_engine.query(query_with_context)
source_documents = [{"filename": doc['filename'], "text": doc['text']} for doc in response_docs]
response_data = {
'response': response_text,
'sources': source_documents
}
return jsonify(response_data)
i did this response_data = {
'response': response_text,
'sources': source_documents ,
'response source': response_text.source_nodes
} ERROR:root:Error processing query: 'dict' object has no attribute 'source_nodes'
Traceback (most recent call last):
File "c:\Users\asus\Documents\PFEPROJECT\app.py", line 227, in query
'response source': response_text.source_nodes
AttributeError: 'dict' object has no attribute 'source_nodes'
To get source nodes you need to check the response object.
response = query_engine.query(your query here)
# Now iterate over the nodes
for node in response.source_nodes:
print(node) # This will print the entire node
Now you can extract required items from this node like metadata and source text and add it to your final response object that you will return
like this
response_text = query_engine.query(query_with_context)
source_documents = [{"filename": doc['filename'], "text": doc['text']} for doc in response_docs]
response_data = {
'response': response_text,
'sources': source_documents ,
}
for node in response_text.source_nodes:
print(node)
return jsonify(response_data) @WhiteFang_Jr
No, What exact item do you want to return in the response object?
I WANT TO RETURN THE CHUNK , the document name and the page number @WhiteFang_Jr
is there any documentation about it @WhiteFang_Jr
Have you checked whats inside the node?
You get all these details inside the node.
final_response = {}
response = query_engine.query(your query here)
final_response['response'] = response.response
count = 0
for source in response.source_nodes:
source_dict = {}
source_dict['extra_info'] = source.node.extra_info # This dict will contain metadata for page label and filename
source_dict['text'] = source.node.text
final_response['source_'+str(count + 1)] = source_dict
count = count +1
return final_response
@ᴷᵉⁿˢʰⁱHOUSSNI I would say the info is there in the metadata of the node by default. You can also explore manually creating nodes that way you have control over what goes in the metadata in case you want to add a few more things in the future.
i think its not working because im getting the document from Mongodb where there is different chunk text without and there is no mention of page label