How to store metadata in llamaindex

At a glance

The community members are discussing how to store metadata in LlamaIndex. One community member suggests adding metadata to the extra_info of input documents, providing an example of setting the file_name. However, another community member notes that while they have stored metadata this way, the output results from queries were different than expected.

The community members further discuss the issue, with one stating that they have trained several PDF files using LlamaIndex, but when asking "What are these documents about?", they receive a response indicating that no specific documents have been mentioned. The community members suggest that this may be due to the LlamaIndex using a vector index, which only retrieves the top 2 nodes by default, and that using a ListIndex with response_mode="tree_summarize" or a TreeIndex may be a solution to get a response about the documents.

There is no explicitly marked answer in the provided information.

Useful resources

oopenmind

How to store metadata in llamaindex?
Please advice me

4 comments

LLogan M

You can add metadata to the extra_info of input documents

Plain Text

document.extra_info = {'file_name': 'text.txt'}

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_documents.html

oopenmind

Yes, I have stored like this, but output result from query was a bit different
Example
I have several documents
If I ask "what is these documents about?"
it didn't know

oopenmind

I have trained several pdf files using llama-Index.
And if I ask like this "What are these documents about?"
I get this response.
"I cannot provide a reliable response based on the available knowledge base as no specific documents have been mentioned. Please provide more information or upload the documents you are referring to, and I will be happy to assist you."

I want to get response about documents as I have already trained files.
Help me plz

LLogan M

If you are using a vector index, it's not going to know what all the documents are about because it's only retrieving the top 2 nodes by default

For a question like that, the LLM would have to read the entire index (either a ListIndex with response_mode="tree_summarize" or a TreeIndex will do that)

Add a reply

Find answers from the community

How to store metadata in llamaindex