just add on it how to provide some list

At a glance

just add on it. how to provide some list of documents as reference document, which AI used to form the answer ? any some code will be welcome.

36 comments

WWhiteFang_Jr

You can get the source text which is used to form the response like this

Plain Text

 llm_response = index.as_query_engine().query("what is this doc about")
 for source in llm_response.source_nodes:
            source_dict = {}
            # This is where you get info like filename, page no etc
            if source.node.extra_info:
                source_dict['extra_info'] = source.node.extra_info
            
            source_dict['similarity_score'] = source.score
            source_dict['text'] = source.node.text

For adding more info you can add data while creating node objects.
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/documents_and_nodes/usage_documents.html#customizing-documents

aautratec

thanks for the quick response. i got following error when i try to follow some thought from referfence above;

aautratec

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True

/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(

TypeError: 'SimpleDirectoryReader' object is not iterable

aautratec

any thougth of troubel shooting ?

WWhiteFang_Jr

What are changes that you did? Can you share your code?

aautratec

here is the code:

!pip install langchain
!pip install llama-index

import os
import json
import openai
import sys

from llama_index import VectorStoreIndex, SimpleDirectoryReader, LangchainEmbedding, PromptHelper, LLMPredictor, ServiceContext, set_global_service_context
from llama_index import load_index_from_storage, StorageContext, QuestionAnswerPrompt
from llama_index.llms import OpenAI
from llama_index.indices.postprocessor.node import SimilarityPostprocessor
from IPython.display import Markdown, display

from llama_index import Document

Create a document with filename in metadata

document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)

document.metadata = {'filename': '<doc_file_name>'}

filename_fn = lambda filename: {'file_name': filename}

os.environ['OPENAI_API_KEY'] = "NA"

service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0))
set_global_service_context(service_context)
openai.api_key = os.environ["OPENAI_API_KEY"]

path_content_source = "a"
path_vector_db = "b"

QA_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, Explain and add conclusion at the end: {query_str}\n"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)

documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)

index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)

index.set_index_id("nhs")
index.storage_context.persist(path_vector_db)

aautratec

storage_context = StorageContext.from_defaults(persist_dir=path_vector_db)
index = load_index_from_storage(storage_context=storage_context,service_context=service_context)

query_engine = index.as_query_engine(
service_context=service_context,
text_qa_template=QA_PROMPT,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.2)],
verbose=True,
)

while True:
query_text = input("Enter your question (or 'exit' to quit): ")

if query_text.lower() == "exit":
print("Exiting the program...")
break

response = query_engine.query(query_text)
response_ext = Markdown(f"{response.response}").data

for source_node in response.source_nodes:
filename = source_node.node.metadata.get('filename')
print(filename)

if response_ext == "None":
response_ext="We were unable to find the answer in our current knowledge database. Kindly ask another question."

print(response_ext)

aautratec

i am new to document meta data setup and retrieve from response.

aautratec

pls help provide your advice @WhiteFang_Jr

WWhiteFang_Jr

Can you try running this

Plain Text

    for source_node in response.source_nodes:
        filename = source_node.node.extra_info.get('filename', None)
        print(filename)

aautratec

but my error was encouterred in the section of indexing:

Building the index file

index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True

/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(

TypeError: 'SimpleDirectoryReader' object is not iterable

aautratec

btw. will those code will be necessary, if i doing the standard index ?

from llama_index import Document

Create a document with filename in metadata

WWhiteFang_Jr

Oh I checked the last part directly, This occurred when you added the file_metadata part to the Reader?

aautratec

yes.

WWhiteFang_Jr

See If you are going to add your text using Document class then the SimpleDirectoryReader is not required as it only returns Node objects same as what you create using Document class.

aautratec

i don't quite understand. how i should change my code to ensure the meta data was included in the indexing, and when the qurey proivde reponse, i can also show the reference information, like file name or other information i will include in the future ?

WWhiteFang_Jr

documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)
This part should work!, Let me try running it once at my end

aautratec

i beleive this what i write in the code. wait for your reponse. thanks for help

WWhiteFang_Jr

Hey!!
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn).load_data()

It will work now

aautratec

let me try

aautratec

yes. it is working. i belive the reference document just missed : .load_data()

aautratec

again thanks for the quick response.

WWhiteFang_Jr

Yes

WWhiteFang_Jr

@Logan M for your reference Doc for Metadata is missing .load_data()
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/documents_and_nodes/usage_documents.html#metadata

It should be like this:

documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn).load_data()

aautratec

thanks for the clarfication. btw, i am still confused with code. i try both:

# Assuming 'response' is the response from your query
for source_node in response.source_nodes:
filename = source_node.node.metadata.get('file_name')
print(filename)

for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('filename', None)
print(filename)

the first section with file_name give me the full path of file i used for referece, which is nice. the second one come back nothing. any thoughts here ?
@WhiteFang_Jr

WWhiteFang_Jr

In second one you have used filename whereas in first one you have used file_name
Can you check if this is the case

aautratec

here is the code related to those filename vs file_name:

document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)

document.metadata = {'filename': '<doc_file_name>'}

filename_fn = lambda filename: {'file_name': filename}

which line i shoudl remove ?

WWhiteFang_Jr

Just try changing
for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('file_name', None)
print(filename)

aautratec

yes. this is working. have tried.

WWhiteFang_Jr

Also you do not need all of these

Just this part
filename_fn = lambda filename: {'file_name': filename}

WWhiteFang_Jr

Second one was coming empty before right? Now it must be giving you the desired file name

aautratec

ok let me try to simply my code. which might cause duplication and confusion. and you are correct. second one come back NONE. so i am remove it now .

WWhiteFang_Jr

This is for second one

WWhiteFang_Jr

If you see, it is using extra_info to extract metadata info

aautratec

yes. i find extra_info is also working :-).

SSJ

@autratec mind posting the updated code?

Add a reply

Find answers from the community

just add on it how to provide some list

Create a document with filename in metadata

Building the index file

Create a document with filename in metadata