A community member is new to Llama Parse and is trying to implement it in a Llama 3.1 based RAG app that uses Langchain. They are having trouble using Document.to_langchain_format() to make the document object usable in Langchain, encountering an "Attribute Error: Tuple object has no attribute 'metadata'" error. The community member has tried passing in a function that returns a metadata dict, but the Kapa.ai community has not been helpful.
In the comments, another community member suggests that file_metadata should be a function, not a dict, and provides a sample fix. The original community member tries this but still encounters issues, now with a "list object..." error. They share the full traceback, which indicates the problem is with the Document.to_langchain_format() call. Another community member suggests that the correct approach would be to use a list comprehension to convert each document in llama_parse_documents to Langchain format.
There is no explicitly marked answer, but the community members work together to try to resolve the issue.
I'm new to Llama Parse, and trying to implement it in a small Llama 3.1 based RAG app that uses Langchain. I am trying to use Document.to_langchain_format() to make the document object usable in langchain, but I keep running into the "Attribute Error: Tuple object has no attribute 'metadata'", despite passing in a function that returns a metadata dict. The Kapa.ai hasn't been much help. Anyone able to give me some assistance here?
if extension == '.pdf': from llama_parse import LlamaParse from llama_index.core import SimpleDirectoryReader from llama_index.core.schema import Document
parser = LlamaParse(result_type="markdown") # "markdown" and "text" are available) file_extractor = {".pdf": parser}
@Logan M here's the traceback: Traceback (most recent call last): File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling result = func() ^^^^^^ File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec exec(code, module.dict) File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py", line 195, in <module> data = load_document(file_name) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py", line 33, in load_document loader = Document.to_langchain_format(llama_parse_documents) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/llama_index/core/schema.py", line 717, in to_langchain_format metadata = self.metadata or {} ^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'metadata'