Hi all, when trying to use OpenAI's '

At a glance

Hi all, when trying to use OpenAI's 'text-embedding-3-large' embeddings with a Qdrant client, I get a "ValueError: shapes (469,3072) and (1536,) not aligned: 3072 (dim 1) != 1536 (dim 0)" error.

I think this has to do with the embedding model producing vectors of a size that Qdrant doesn't expect, but I'm not sure how to fix. I've tried setting the Qdrant client vector parameter of size to 3072 but this doesn't help -- the error becomes "ValueError: operands could not be broadcast together with shapes (469,3072) (1536,)". Any ideas would be appreciated, thanks!

Plain Text

client = qdrant_client.QdrantClient(location=":memory:")
    client.create_collection(collection_name="collection_name",
    vectors_config=VectorParams(size=3072, distance=Distance.EUCLID), #FIXME
)

    vector_store = QdrantVectorStore(client=client, collection_name="collection_name")

8 comments

LLogan M

How did you setup your embedding model? Somewhere in your pipeline it seems like its using ada-002 maybe?

aaelita

Hmm, I was using ada-002 before but I changed it to text-embedding-3-large in my ingestion pipeline:

Plain Text

# create the ingestion pipeline with transformations
    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=200, chunk_overlap=10), # adjust chunk size and overlap
            TitleExtractor(), # metadata extraction (extracts title)
            OpenAIEmbedding(model='text-embedding-3-large'), # embeddings are calculated as part of the pipeline
        ],
        vector_store=vector_store, # set vector store to qdrant store
    )

aaelita

@Logan M I actually found your answer to this old question (https://github.com/run-llama/llama_index/issues/1029), what did you mean by "starting fresh"? Thanks!

LLogan M

starting fresh means not with an existing index (i.e. need to re-embed all your data)

aaelita

Ah ok, got it. I think I'm doing that? Every time I run my streamlit app a user uploads a PDF which is then indexed but I'm still getting the error :/

LLogan M

when you query the index, are you setting the embed model as well?

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store, service_context=ServiceContext.from_defaults(embed_model=embed_model))

aaelita

I wasn't, but this worked -- thanks so much!!

LLogan M

Great!

Add a reply

Find answers from the community

Hi all, when trying to use OpenAI's '