Find answers from the community

Updated 6 months ago

Hey. I'm trying to use Qdrant with

At a glance

A community member is trying to use Qdrant with InstructorEmbeddings, but is encountering a Pydantic error when setting the index with storage_context. The error is PydanticSerializationError: Unable to serialize unknown type: <class>. Another community member suggests that the issue is because InstructorEmbedding is returning a NumPy array instead of a plain list of floats. They test this theory and confirm it. The community members then discuss patching the class to address this issue, and one community member notes that this will be the first time they have reported a bug that will be shipped to something public, which they consider a small win.

Useful resources
Hey. I'm trying to use Qdrant with InstructorEmbeddings. When I'm trying to set the index with storage_context, it returns a Pydantic error - PydanticSerializationError: Unable to serialize unknown type: <class 'numpy.ndarray'>

Here is my code
Plain Text
from llama_index.embeddings import InstructorEmbedding
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

qdrant = QdrantClient("http://localhost:6333")

embed_model = InstructorEmbedding(model_name="hkunlp/instructor-base")

# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

service_context = ServiceContext.from_defaults(llm=None,
    embed_model=embed_model, chunk_size=512
)
vector_store = QdrantVectorStore(client=qdrant, collection_name="paul_graham")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# This works
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
)
L
E
11 comments
Can you share the full traceback?
Its pretty long.
hmmmm I THINK it's because InstructorEmbedding is returning numpy instead of a plain list of floats
(that traceback was very helpful btw)
let me test that theory
Plain Text
>>> embeds = embed_model.get_text_embedding("Hello world!")
>>> type(embeds)
<class 'numpy.ndarray'>
>>> 
ok, will patch that class then
omg. First time I have reported a bug and it will be shipped to something public.
p.s. these small wins in life πŸ˜‰
Thanks for reporting this!! :dotsCATJAM:
Add a reply
Sign up and join the conversation on Discord