Why is the embedding null when uploading

At a glance

The community members are experiencing an issue where the embedding is null when uploading to Qdrant, but manually executing the embedding computation yields non-null results. They have determined that this issue is not specific to Qdrant, as they can reproduce it with Chroma as well. The issue seems to be related to the llama-index library, where the embedding field of the node is not being populated when retrieving the data. A community member suggests that there may be a PR to attach the embedding to the node in the vector database being used.

The community members have also encountered a strange issue where the problem only occurs in a Docker container, but not in their local setup. They have narrowed it down to a difference in the version of the llama-index library (0.8.41 in Docker vs. 0.8.46 locally), and have found that the newer version works fine locally. However, they are still unsure why the Docker version of the retriever is failing in this way.

After some investigation, a community member suggests that the issue may be related to the query being passed in the Docker container, as it is not a string but a Chainlit message that needs to be unwrapped. This turns out to be the correct solution, as the community members find that pinning Chainlit to a specific patch version resolves the issue

ggeoHeil

Why is the embedding null when uploading it to qdrant? What can I do to debug it? manually executing the embedding computation yields non-null results

16 comments

ggeoHeil

In fact this is unrelated to qdrant - I can totally reproduce this with chroma

ggeoHeil

I.e. embedding_model.get_text_embedding(documents[0].text) is returning a value, however, VectorStoreIndex.from_documents when queried via index.as_retriever().retrieve('foo')[0].embedding is null/empty

LLogan M

It's not that it's empty, it's just the llama-index is not populating the embedding field of the node when retirveing (the embedding is stored separately from the node)

There code be a PR to attach the embedding to the node in the vector db you are using

ggeoHeil

Interesting

ggeoHeil

| result = self.index.as_retriever(similarity_top_k=10).retrieve(query)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/opt/conda/lib/python3.11/site-packages/llama_index/indices/base_retriever.py", line 22, in retrieve
| return self._retrieve(str_or_query_bundle)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/opt/conda/lib/python3.11/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 81, in _retrieve
| if query_bundle.embedding is None and len(query_bundle.embedding_strs) > 0:

ggeoHeil

as I am running into 'Message' object has no attribute 'embedding'

ggeoHeil

but only in the docker container

ggeoHeil

my local installation works fine

ggeoHeil

locally I have a 0.8.41 in docker where it fails 0.8.46

ggeoHeil

However, when using the newer version locally it also works.

ggeoHeil

Do you have any idea why the docker version of the retriever is failing in this strange way?

ddisiok

hmm that's quite strange. maybe check if the query you pass in in the docker container is indeed a string?

ggeoHeil

A simple python script inside docker works just fine. However, a chainlit based fails with this error

ggeoHeil

Wow - indeed this was the right pointer. I have no idea why inside docker the message is not a string (it ineed is a string in the local setup) but it is a chainlit message which needs to be unwrapped. @disiok thanks!

ggeoHeil

turns out chainlit was only pinned to the Minor and not to Patch - and they recently bumped from 0.3 0.300 ...

ddisiok

glad it helped!

Add a reply

Find answers from the community

Why is the embedding null when uploading