Find answers from the community

Updated 6 months ago

Hi Everyone,

At a glance

A community member is new to Llama Parse and is trying to implement it in a Llama 3.1 based RAG app that uses Langchain. They are having trouble using Document.to_langchain_format() to make the document object usable in Langchain, encountering an "Attribute Error: Tuple object has no attribute 'metadata'" error. The community member has tried passing in a function that returns a metadata dict, but the Kapa.ai community has not been helpful.

In the comments, another community member suggests that file_metadata should be a function, not a dict, and provides a sample fix. The original community member tries this but still encounters issues, now with a "list object..." error. They share the full traceback, which indicates the problem is with the Document.to_langchain_format() call. Another community member suggests that the correct approach would be to use a list comprehension to convert each document in llama_parse_documents to Langchain format.

There is no explicitly marked answer, but the community members work together to try to resolve the issue.

Hi Everyone,

I'm new to Llama Parse, and trying to implement it in a small Llama 3.1 based RAG app that uses Langchain. I am trying to use Document.to_langchain_format() to make the document object usable in langchain, but I keep running into the "Attribute Error: Tuple object has no attribute 'metadata'", despite passing in a function that returns a metadata dict. The Kapa.ai hasn't been much help. Anyone able to give me some assistance here?

def get_meta(file):
filename, extension = os.path.splitext(file)
metadata_dict = {
'filepath': {"filename": extension}
}
return metadata_dict

def load_document(file):
import os
name, extension = os.path.splitext(file)
os.environ['LLAMA_CLOUD_API_KEY'] = 'llx-Ks8gd2ve9Qwwu0RrHn44RsMcrg79GtrYUFKTMJa4UwSpeFxX'

if extension == '.pdf':
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
from llama_index.core.schema import Document


parser = LlamaParse(result_type="markdown") # "markdown" and "text" are available)
file_extractor = {".pdf": parser}

llama_parse_documents = SimpleDirectoryReader(input_files=[file], file_extractor=file_extractor, file_metadata={}).load_data(),

loader = Document.to_langchain_format(llama_parse_documents)
L
M
10 comments
file_metadata is supposed to be a function, not a dict
Plain Text
llama_parse_documents = SimpleDirectoryReader(
  input_files=[file], 
  file_extractor=file_extractor, 
  file_metadata=lambda filename: {}
).load_data()
@Logan M no luck :(. I tried that, and also tried passing in this function:
filename_fn = lambda filename: {"file_name": file}

Now I get the error as 'list object...' instead
Whats the full traceback? That would probably help narrow down the issue
loader = Document.to_langchain_format(llama_parse_documents) this seems wrong
I would expect something like

documents = [x.to_langchain_format() for x in llama_parse_documents]
@Logan M here's the traceback:
Traceback (most recent call last):
File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
exec(code, module.dict)
File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py", line 195, in <module>
data = load_document(file_name)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py", line 33, in load_document
loader = Document.to_langchain_format(llama_parse_documents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/llama_index/core/schema.py", line 717, in to_langchain_format
metadata = self.metadata or {}
^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'metadata'
yea, my fix above will fix that one
@Logan M hell yeah thank you
Add a reply
Sign up and join the conversation on Discord