Find answers from the community

Updated 10 months ago

Payload - Qdrant

At a glance

A community member encountered an issue where metadata added directly through the Qdrant SDK wasn't being returned by LlamaIndex's retrieval process, despite being visible in the Qdrant UI. The metadata was only accessible when added through LlamaIndex's interface.

Through discussion, it was revealed that LlamaIndex only uses the _node_content field for metadata, which contains the serialized node information. When adding metadata directly through Qdrant, it doesn't get incorporated into this field, explaining why it's not retrieved.

The solution, which was confirmed to work, involves: looping through every point, extracting the payload containing node_content, adding the extra information, reserializing it back to _node_content, updating the payload with the new _node_content, and using cl.overwrite_payload on each point. This ensures the additional metadata becomes part of the node content that LlamaIndex can retrieve.

Useful resources
I'm currently using LlamaIndex with Qdrant as the vector database.

When adding metadata to nodes via LlamaIndex like this:

Plain Text
document.metadata = {
    "source_id": source_id,
    "document_name": document_name
}


and retrieving it with:

Plain Text
retriever = index.as_retriever(...)
retrieved_nodes = retriever.retrieve(query)


I can access the added metadata through retrieved_nodes[0].metadata.

However, when I add metadata using Qdrant's Python SDK (https://qdrant.tech/documentation/concepts/payload/#:~:text=%7D-,You%20don%E2%80%99t%20need%20to%20know%20the%20ids%20of%20the%20points%20you%20want%20to%20modify.%20The%20alternative%20is%20to%20use%20filters.,-http), the metadata isn't returned by LlamaIndex's retrieval process, even though it's visible on the Qdrant UI.

What does LlamaIndex do differently that allows the metadata to be returned upon retrieval?
1
S
W
L
14 comments
This is how I am adding the metadata to an existing record using Qdrant SDK

Plain Text
print(f"Adding key: {meta_name}, value: {meta_value}")
    client.set_payload(
        collection_name=f"{collection_name}",
        payload={meta_name: meta_value},
        points=models.Filter(must=[
            models.FieldCondition(
                key=filter_name,
                match=models.MatchValue(value=filter_value),
            ),
        ], ),
    )
I think you can start looking from here: https://github.com/run-llama/llama_index/blob/f5263896121721de1051ce58338a1e0ea6950ca7/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/base.py#L254


See how it is being done from LlamaIndex side, then check the qdrant UI to see how it is reflected in that particular node.

Then do the same directly , there could be some different in it, maybe being added up somewhere else when youa re doing it directly.
Yea the source code is a good place to start
Thank you, I'll check it
Did you find the answer to this? I'm having this issue too!
I've never been able to replicate this. Metadata always returns fine for me (assuming that llama-index both interted/created the index, and queried it)
Strange. it returns in the qdrant webservice. Is llama-index returning all the payload tags or just the _node_content. The QDrant webservice returns this:

Whereas the as_retriever node for that point returns this:

Thoughts?
As above - After inserting it using llama-index - I'm later manually adding more data later to the payload using a qdrant call i.e. og tag information
_node_content is the entire node serialized -- it takes that then creates a node
It won't use other fields for metadata I think
thanks! Thats the issue then. It would be a shame to have to make another call for each returned item. What is the best llama-index call to update the payload instead?
I 'think' an approach is to

1) loop though every point
2) extract each payload which contas the node_content
3) add the extra information
4) reserialize it back to _node_content
5) update the payload to use this new _node_content
6) call cl.overwrite_payload on each point
Yea that would work
It worked perfectly! Thanks for your help Logan.
Add a reply
Sign up and join the conversation on Discord