Find answers from the community

Updated last year

just add on it how to provide some list

At a glance
just add on it. how to provide some list of documents as reference document, which AI used to form the answer ? any some code will be welcome.
W
a
S
36 comments
You can get the source text which is used to form the response like this
Plain Text
 llm_response = index.as_query_engine().query("what is this doc about")
 for source in llm_response.source_nodes:
            source_dict = {}
            # This is where you get info like filename, page no etc
            if source.node.extra_info:
                source_dict['extra_info'] = source.node.extra_info
            
            source_dict['similarity_score'] = source.score
            source_dict['text'] = source.node.text

For adding more info you can add data while creating node objects.
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/documents_and_nodes/usage_documents.html#customizing-documents
thanks for the quick response. i got following error when i try to follow some thought from referfence above;
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True

/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(

TypeError: 'SimpleDirectoryReader' object is not iterable
any thougth of troubel shooting ?
What are changes that you did? Can you share your code?
here is the code:

!pip install langchain
!pip install llama-index

import os
import json
import openai
import sys

from llama_index import VectorStoreIndex, SimpleDirectoryReader, LangchainEmbedding, PromptHelper, LLMPredictor, ServiceContext, set_global_service_context
from llama_index import load_index_from_storage, StorageContext, QuestionAnswerPrompt
from llama_index.llms import OpenAI
from llama_index.indices.postprocessor.node import SimilarityPostprocessor
from IPython.display import Markdown, display

from llama_index import Document

Create a document with filename in metadata


document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)

document.metadata = {'filename': '<doc_file_name>'}

filename_fn = lambda filename: {'file_name': filename}

os.environ['OPENAI_API_KEY'] = "NA"

service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0))
set_global_service_context(service_context)
openai.api_key = os.environ["OPENAI_API_KEY"]

path_content_source = "a"
path_vector_db = "b"

QA_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, Explain and add conclusion at the end: {query_str}\n"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)

documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)

index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)


index.set_index_id("nhs")
index.storage_context.persist(path_vector_db)
storage_context = StorageContext.from_defaults(persist_dir=path_vector_db)
index = load_index_from_storage(storage_context=storage_context,service_context=service_context)


query_engine = index.as_query_engine(
service_context=service_context,
text_qa_template=QA_PROMPT,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.2)],
verbose=True,
)


while True:
query_text = input("Enter your question (or 'exit' to quit): ")

if query_text.lower() == "exit":
print("Exiting the program...")
break

response = query_engine.query(query_text)
response_ext = Markdown(f"{response.response}").data

for source_node in response.source_nodes:
filename = source_node.node.metadata.get('filename')
print(filename)

if response_ext == "None":
response_ext="We were unable to find the answer in our current knowledge database. Kindly ask another question."

print(response_ext)
i am new to document meta data setup and retrieve from response.
pls help provide your advice @WhiteFang_Jr
Can you try running this
Plain Text
    for source_node in response.source_nodes:
        filename = source_node.node.extra_info.get('filename', None)
        print(filename)
but my error was encouterred in the section of indexing:

Building the index file

index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True

/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(

TypeError: 'SimpleDirectoryReader' object is not iterable
btw. will those code will be necessary, if i doing the standard index ?

from llama_index import Document

Create a document with filename in metadata


document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)

document.metadata = {'filename': '<doc_file_name>'}

filename_fn = lambda filename: {'file_name': filename}
Oh I checked the last part directly, This occurred when you added the file_metadata part to the Reader?
See If you are going to add your text using Document class then the SimpleDirectoryReader is not required as it only returns Node objects same as what you create using Document class.
i don't quite understand. how i should change my code to ensure the meta data was included in the indexing, and when the qurey proivde reponse, i can also show the reference information, like file name or other information i will include in the future ?
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)
This part should work!, Let me try running it once at my end
i beleive this what i write in the code. wait for your reponse. thanks for help
Hey!!
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn).load_data()

It will work now
yes. it is working. i belive the reference document just missed : .load_data()
again thanks for the quick response.
@Logan M for your reference Doc for Metadata is missing .load_data()
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/documents_and_nodes/usage_documents.html#metadata

It should be like this:

documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn).load_data()
thanks for the clarfication. btw, i am still confused with code. i try both:

# Assuming 'response' is the response from your query
for source_node in response.source_nodes:
filename = source_node.node.metadata.get('file_name')
print(filename)

for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('filename', None)
print(filename)

the first section with file_name give me the full path of file i used for referece, which is nice. the second one come back nothing. any thoughts here ?
@WhiteFang_Jr
In second one you have used filename whereas in first one you have used file_name
Can you check if this is the case
here is the code related to those filename vs file_name:

document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)

document.metadata = {'filename': '<doc_file_name>'}

filename_fn = lambda filename: {'file_name': filename}

which line i shoudl remove ?
Just try changing
for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('file_name', None)
print(filename)
yes. this is working. have tried.
Also you do not need all of these

Just this part
filename_fn = lambda filename: {'file_name': filename}
Second one was coming empty before right? Now it must be giving you the desired file name
ok let me try to simply my code. which might cause duplication and confusion. and you are correct. second one come back NONE. so i am remove it now .
This is for second one
If you see, it is using extra_info to extract metadata info
yes. i find extra_info is also working :-).
@autratec mind posting the updated code?
Add a reply
Sign up and join the conversation on Discord