Find answers from the community

Updated 3 months ago

Hi all, when trying to use OpenAI's '

Hi all, when trying to use OpenAI's 'text-embedding-3-large' embeddings with a Qdrant client, I get a "ValueError: shapes (469,3072) and (1536,) not aligned: 3072 (dim 1) != 1536 (dim 0)" error.

I think this has to do with the embedding model producing vectors of a size that Qdrant doesn't expect, but I'm not sure how to fix. I've tried setting the Qdrant client vector parameter of size to 3072 but this doesn't help -- the error becomes "ValueError: operands could not be broadcast together with shapes (469,3072) (1536,)". Any ideas would be appreciated, thanks!

Plain Text
client = qdrant_client.QdrantClient(location=":memory:")
    client.create_collection(collection_name="collection_name",
    vectors_config=VectorParams(size=3072, distance=Distance.EUCLID), #FIXME
)

    vector_store = QdrantVectorStore(client=client, collection_name="collection_name")
L
a
8 comments
How did you setup your embedding model? Somewhere in your pipeline it seems like its using ada-002 maybe?
Hmm, I was using ada-002 before but I changed it to text-embedding-3-large in my ingestion pipeline:

Plain Text
# create the ingestion pipeline with transformations
    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=200, chunk_overlap=10), # adjust chunk size and overlap
            TitleExtractor(), # metadata extraction (extracts title)
            OpenAIEmbedding(model='text-embedding-3-large'), # embeddings are calculated as part of the pipeline
        ],
        vector_store=vector_store, # set vector store to qdrant store
    )
@Logan M I actually found your answer to this old question (https://github.com/run-llama/llama_index/issues/1029), what did you mean by "starting fresh"? Thanks!
starting fresh means not with an existing index (i.e. need to re-embed all your data)
Ah ok, got it. I think I'm doing that? Every time I run my streamlit app a user uploads a PDF which is then indexed but I'm still getting the error :/
when you query the index, are you setting the embed model as well?

Plain Text
index = VectorStoreIndex.from_vector_store(vector_store, service_context=ServiceContext.from_defaults(embed_model=embed_model))
I wasn't, but this worked -- thanks so much!!
Add a reply
Sign up and join the conversation on Discord