Find answers from the community

Updated last year

Hi! I wanted to try fastembed library

At a glance
Hi! I wanted to try fastembed library with Qdrant retriever, but getting this error:

Plain Text
ValueError: shapes (176,768) and (1536,) not aligned: 768 (dim 1) != 1536 (dim 0)


My code:

Plain Text
  qdrant_vectorstore_client = qdrant_client.QdrantClient(
      location=":memory:"
  )

  fastembed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
    vector_store = QdrantVectorStore(client=qdrant_vectorstore_client,
                                     collection_name=collection_name)
    pipeline = IngestionPipeline(
        transformations=[
            TokenTextSplitter(),
            fastembed_model
        ],
        vector_store=vector_store
    )
  documents = [...]
  pipeline.run(documents=documents)
  index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
  retriever = index.as_retriever(similarity_top_k=3)
  
  nodes_with_sources = retriever.retrieve("...") # <----- the error occurs here, why?
p
L
12 comments
@Logan M please, help 😜
I think it is because storage context 's default embedding model is ada-002, but I'm not defining storage context in my code, so it has to be defined by default somewhere, but where?
okay, it is in VectorStoreIndex.from_vector_store part. Now I need to figure out how to replace ada-002 with my embedding
I mean service context ,not storage context. It is service context, right?
right, need to add service context
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
That service context should have your embed model
@Logan M Thanks! Now, I'm looking to the arguments of ServiceContext and it has everything I've defined in IngestionPipeline (splitter, embedding model). Do I need to remove IngestionPipeline and declare my splitter in service context? Or is my code okay?

Plain Text
  qdrant_vectorstore_client = qdrant_client.QdrantClient(
      location=":memory:"
  )

  fastembed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
    vector_store = QdrantVectorStore(client=qdrant_vectorstore_client,
                                     collection_name=collection_name)
    pipeline = IngestionPipeline(
        transformations=[
            TokenTextSplitter(),
            fastembed_model
        ],
        vector_store=vector_store
    )
  documents = [...]
  pipeline.run(documents=documents)

  service_context = ServiceContext.from_defaults(embed_model=fastembed_model)   <----- adding this
  index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
  retriever = index.as_retriever(similarity_top_k=3)
  
  nodes_with_sources = retriever.retrieve("...")
your code is fine -- you can add the splitter if you want to the service context, but eh, not too important
@Logan M But will it split based on what I've declared in IngestionPipeline or will ServiceContext override configs from ingestion pipeline?
Nodes will be split bases on ingestion pipeline -- the nodes are already split and inserted by the time you call from_vector_store
If you used index.insert(document) or from_documents(), then it uses the service context
Add a reply
Sign up and join the conversation on Discord