Find answers from the community

Updated 5 days ago

I was trying to store a list in a value called "assciated_images" key in the metadata, bu

@kapa.ai I was trying to store a list in a value called "assciated_images" key in the metadata, but i got this error: ValueError: Value for metadata associated_images must be one of (str, int, float, None)
k
c
12 comments
Please stand by @cmosguy as I search the knowledge base. I normally take about 15 seconds or so. :wave:
----------------------
Based on the information provided in the knowledge sources, I can help explain the error you're encountering and suggest a solution.

The error you're seeing is related to the metadata values that can be stored in LlamaIndex. According to the examples in the knowledge sources, metadata values must be one of the following types: string, integer, float, or None.

To resolve this issue, you can convert your list to a string format before storing it in the metadata. A common approach is to use JSON serialization. Here's how you can modify your code:

Plain Text
import json

# Assuming your list is called 'associated_images'
document["associated_images"] = json.dumps(associated_images)


This approach is demonstrated in several examples from the LlamaIndex documentation. For instance:

MongoDB Atlas Vector Search RAG OpenAI shows:

Plain Text
document["writers"] = json.dumps(document["writers"])
document["languages"] = json.dumps(document["languages"])
document["genres"] = json.dumps(document["genres"])


Similarly, MongoDB Atlas Vector Search RAG Fireworks demonstrates:

Plain Text
document["cuisine"] = json.dumps(document["cuisine"])
document["attributes"] = json.dumps(document["attributes"])
document["menu"] = json.dumps(document["menu"])


By converting your list to a JSON string, you ensure that the metadata value is a string, which is one of the accepted types. When you need to use this data later, you can parse it back into a list using json.loads().
Remember, this approach is necessary because LlamaIndex's metadata storage is designed to work with simple data types (str, int, float, None) for compatibility and performance reasons across various vector stores and databases.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai so somehting like this:

associated_images_json = json.dumps(
[
{
"file_path": img["image_doc"].image_path,
}
for img in associated_images
]
)
updated_md_doc = Document(
text=md_doc.text,
metadata={**md_doc.metadata, "associated_images": associated_images_json},
)
Let me check the knowledge base for you again. One moment! :books:
----------------------
Yes, your approach is correct and aligns well with the best practices for handling complex data types in LlamaIndex metadata. Let's break down your code and explain why it works:

Plain Text
associated_images_json = json.dumps(
    [
        {
            "file_path": img["image_doc"].image_path,
        }
        for img in associated_images
    ]
)
updated_md_doc = Document(
    text=md_doc.text,
    metadata={**md_doc.metadata, "associated_images": associated_images_json},
)


This code does the following:

  1. It creates a list of dictionaries, where each dictionary contains the file_path of an associated image.
  2. The json.dumps() function is used to convert this list of dictionaries into a JSON string.
  3. The JSON string is then assigned to the "associated_images" key in the metadata dictionary.
This approach solves the ValueError you encountered earlier because now you're storing a string (the JSON-encoded data) in the metadata, which is one of the accepted types (str, int, float, None) for metadata values in LlamaIndex.

Your solution is similar to the examples found in the LlamaIndex documentation. For instance, the Multi-Modal RAG Evaluation example shows a similar pattern for handling image metadata:

Plain Text
source_image_nodes = [
    score_img_node.node.metadata["file_path"]
    for score_img_node in result.metadata["image_nodes"]
]


While this example doesn't use JSON encoding directly, it demonstrates the concept of storing image-related information in metadata.

When you need to use this data later, you can retrieve it from the metadata and parse it back into a Python object using json.loads():

Plain Text
import json

associated_images = json.loads(updated_md_doc.metadata["associated_images"])
This approach allows you to store complex data structures in the metadata while adhering to LlamaIndex's requirements for metadata values. It's a good solution for your use case of storing associated image information.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
how can i update metadata for a document before it is indexed?
@kapa.ai lets say I have a Document, how do I update or modify its metadata?
Let me check the knowledge base for you again. One moment! :books:
----------------------
To update or modify the metadata of an existing Document in LlamaIndex, you have a few options. I'll explain the methods based on the information provided in the knowledge sources.

  1. Directly modifying the metadata:
    You can directly access and modify the metadata dictionary of a Document object. Here's an example:
Plain Text
   document.metadata["new_key"] = "new_value"
   


This approach is shown in the Defining and Customizing Documents section of the documentation.

  1. Using the update_ref_doc method:
    If you're working with an index, you can use the update_ref_doc method to update a document and its corresponding nodes. This method is equivalent to deleting the document and then inserting it again with the updated information. Here's how you can use it:
Plain Text
   # Assuming 'index' is your LlamaIndex index and 'document' is your updated Document object
   index.update_ref_doc(document)
   


This method is described in the BaseIndex API reference.

  1. For specific vector stores:
    Some vector stores have their own update methods. For example, with Chroma:
Plain Text
   doc_to_update = chroma_collection.get(limit=1)
   doc_to_update["metadatas"][0] = {
       **doc_to_update["metadatas"][0],
       **{"author": "Paul Graham"},
   }
   chroma_collection.update(
       ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]]
   )
   


This example is from the ChromaIndexDemo.
Remember, when updating metadata, ensure that the values are of the correct type (str, int, float, or None). If you need to store complex data structures, consider using JSON serialization as discussed in our previous conversation.

Also, note that some metadata keys might be excluded from LLM or embedding processes. You can control this using excluded_llm_metadata_keys and excluded_embed_metadata_keys as described in the Advanced Metadata Customization section.

LlamaIndex Documentation provides more detailed information on working with Document metadata.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord