thanks for the quick response. i got following error when i try to follow some thought from referfence above;
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True
/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(
TypeError: 'SimpleDirectoryReader' object is not iterable
any thougth of troubel shooting ?
What are changes that you did? Can you share your code?
here is the code:
!pip install langchain
!pip install llama-index
import os
import json
import openai
import sys
from llama_index import VectorStoreIndex, SimpleDirectoryReader, LangchainEmbedding, PromptHelper, LLMPredictor, ServiceContext, set_global_service_context
from llama_index import load_index_from_storage, StorageContext, QuestionAnswerPrompt
from llama_index.llms import OpenAI
from llama_index.indices.postprocessor.node import SimilarityPostprocessor
from IPython.display import Markdown, display
from llama_index import Document
Create a document with filename in metadata
document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)
document.metadata = {'filename': '<doc_file_name>'}
filename_fn = lambda filename: {'file_name': filename}
os.environ['OPENAI_API_KEY'] = "NA"
service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0))
set_global_service_context(service_context)
openai.api_key = os.environ["OPENAI_API_KEY"]
path_content_source = "a"
path_vector_db = "b"
QA_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, Explain and add conclusion at the end: {query_str}\n"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)
index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)
index.set_index_id("nhs")
index.storage_context.persist(path_vector_db)storage_context = StorageContext.from_defaults(persist_dir=path_vector_db)
index = load_index_from_storage(storage_context=storage_context,service_context=service_context)
query_engine = index.as_query_engine(
service_context=service_context,
text_qa_template=QA_PROMPT,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.2)],
verbose=True,
)
while True:
query_text = input("Enter your question (or 'exit' to quit): ")
if query_text.lower() == "exit":
print("Exiting the program...")
break
response = query_engine.query(query_text)
response_ext = Markdown(f"{response.response}").data
for source_node in response.source_nodes:
filename = source_node.node.metadata.get('filename')
print(filename)
if response_ext == "None":
response_ext="We were unable to find the answer in our current knowledge database. Kindly ask another question."
print(response_ext)
i am new to document meta data setup and retrieve from response.
pls help provide your advice @WhiteFang_Jr
Can you try running this
for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('filename', None)
print(filename)
but my error was encouterred in the section of indexing:
Building the index file
index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
show_progress=True
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-bdbcda253318> in <cell line: 2>()
1 # Building the index file
----> 2 index = VectorStoreIndex.from_documents(
3 documents,
4 service_context=service_context,
5 show_progress=True
/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
94
95 with service_context.callback_manager.as_trace("index_construction"):
---> 96 for doc in documents:
97 docstore.set_document_hash(doc.get_doc_id(), doc.hash)
98 nodes = service_context.node_parser.get_nodes_from_documents(
TypeError: 'SimpleDirectoryReader' object is not iterablebtw. will those code will be necessary, if i doing the standard index ?
from llama_index import Document
Create a document with filename in metadata
document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)
document.metadata = {'filename': '<doc_file_name>'}
filename_fn = lambda filename: {'file_name': filename}Oh I checked the last part directly, This occurred when you added the file_metadata part to the Reader?
See If you are going to add your text using Document
class then the SimpleDirectoryReader
is not required as it only returns Node objects same as what you create using Document
class.
i don't quite understand. how i should change my code to ensure the meta data was included in the indexing, and when the qurey proivde reponse, i can also show the reference information, like file name or other information i will include in the future ?
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn)
This part should work!, Let me try running it once at my end
i beleive this what i write in the code. wait for your reponse. thanks for help
Hey!!
documents = SimpleDirectoryReader(path_content_source, file_metadata=filename_fn).load_data()
It will work now
yes. it is working. i belive the reference document just missed : .load_data()
again thanks for the quick response.
thanks for the clarfication. btw, i am still confused with code. i try both:
# Assuming 'response' is the response from your query
for source_node in response.source_nodes:
filename = source_node.node.metadata.get('file_name')
print(filename)
for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('filename', None)
print(filename)
the first section with file_name give me the full path of file i used for referece, which is nice. the second one come back nothing. any thoughts here ?
@WhiteFang_Jr
In second one you have used filename whereas in first one you have used file_name
Can you check if this is the case
here is the code related to those filename vs file_name:
document = Document(
text='text',
metadata={
'filename': '<doc_file_name>',
'category': '<category>'
}
)
document.metadata = {'filename': '<doc_file_name>'}
filename_fn = lambda filename: {'file_name': filename}
which line i shoudl remove ?
Just try changing
for source_node in response.source_nodes:
filename = source_node.node.extra_info.get('file_name', None)
print(filename)
yes. this is working. have tried.
Also you do not need all of these
Just this part
filename_fn = lambda filename: {'file_name': filename}
Second one was coming empty before right? Now it must be giving you the desired file name
ok let me try to simply my code. which might cause duplication and confusion. and you are correct. second one come back NONE. so i am remove it now .
If you see, it is using extra_info to extract metadata info
yes. i find extra_info is also working :-).
@autratec mind posting the updated code?