Payload - Qdrant

At a glance

A community member encountered an issue where metadata added directly through the Qdrant SDK wasn't being returned by LlamaIndex's retrieval process, despite being visible in the Qdrant UI. The metadata was only accessible when added through LlamaIndex's interface.

Through discussion, it was revealed that LlamaIndex only uses the _node_content field for metadata, which contains the serialized node information. When adding metadata directly through Qdrant, it doesn't get incorporated into this field, explaining why it's not retrieved.

The solution, which was confirmed to work, involves: looping through every point, extracting the payload containing node_content, adding the extra information, reserializing it back to _node_content, updating the payload with the new _node_content, and using cl.overwrite_payload on each point. This ensures the additional metadata becomes part of the node content that LlamaIndex can retrieve.

Useful resources

SSayan

I'm currently using LlamaIndex with Qdrant as the vector database.

When adding metadata to nodes via LlamaIndex like this:

Plain Text

document.metadata = {
    "source_id": source_id,
    "document_name": document_name
}

and retrieving it with:

Plain Text

retriever = index.as_retriever(...)
retrieved_nodes = retriever.retrieve(query)

I can access the added metadata through retrieved_nodes[0].metadata.

However, when I add metadata using Qdrant's Python SDK (https://qdrant.tech/documentation/concepts/payload/#:~:text=%7D-,You%20don%E2%80%99t%20need%20to%20know%20the%20ids%20of%20the%20points%20you%20want%20to%20modify.%20The%20alternative%20is%20to%20use%20filters.,-http), the metadata isn't returned by LlamaIndex's retrieval process, even though it's visible on the Qdrant UI.

What does LlamaIndex do differently that allows the metadata to be returned upon retrieval?

14 comments

SSayan

This is how I am adding the metadata to an existing record using Qdrant SDK

Plain Text

print(f"Adding key: {meta_name}, value: {meta_value}")
    client.set_payload(
        collection_name=f"{collection_name}",
        payload={meta_name: meta_value},
        points=models.Filter(must=[
            models.FieldCondition(
                key=filter_name,
                match=models.MatchValue(value=filter_value),
            ),
        ], ),
    )

WWhiteFang_Jr

I think you can start looking from here: https://github.com/run-llama/llama_index/blob/f5263896121721de1051ce58338a1e0ea6950ca7/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/base.py#L254

See how it is being done from LlamaIndex side, then check the qdrant UI to see how it is reflected in that particular node.

Then do the same directly , there could be some different in it, maybe being added up somewhere else when youa re doing it directly.

LLogan M

Yea the source code is a good place to start

SSayan

Thank you, I'll check it

hholodeck

Did you find the answer to this? I'm having this issue too!

LLogan M

I've never been able to replicate this. Metadata always returns fine for me (assuming that llama-index both interted/created the index, and queried it)

hholodeck

Strange. it returns in the qdrant webservice. Is llama-index returning all the payload tags or just the _node_content. The QDrant webservice returns this:

Whereas the as_retriever node for that point returns this:

Thoughts?

hholodeck

As above - After inserting it using llama-index - I'm later manually adding more data later to the payload using a qdrant call i.e. og tag information

LLogan M

_node_content is the entire node serialized -- it takes that then creates a node

LLogan M

It won't use other fields for metadata I think

hholodeck

thanks! Thats the issue then. It would be a shame to have to make another call for each returned item. What is the best llama-index call to update the payload instead?

hholodeck

I 'think' an approach is to

1) loop though every point
2) extract each payload which contas the node_content
3) add the extra information
4) reserialize it back to _node_content
5) update the payload to use this new _node_content
6) call cl.overwrite_payload on each point

LLogan M

Yea that would work

hholodeck

It worked perfectly! Thanks for your help Logan.

Add a reply

Find answers from the community

Payload - Qdrant