richmond7284

I am getting the error below when I

I am getting the error below when I attempt to conduct semantic chunking with an embedding model where the dimensions parameter is stated.

code

embed_model=OpenAIEmbedding(model="text-embedding-3-large", dimensions=1024)
llm=OpenAI(model="gpt-3.5-turbo", temperature=0, max_tokens=256)
Settings.llm=llm
Settings.embed_model=embed_model

#Semantic chunking strategy
splitter = SemanticSplitterNodeParser(
buffer_size=1,
breakpoint_percentile_threshold=95,
embed_model=Settings.embed_model)

nodes = splitter.get_nodes_from_documents([transcript_doc])
index.insert_nodes(nodes)

------

Error:

packages/llama_index/embeddings/openai/base.py", line 180, in get_embeddings
data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
TypeError: create() got an unexpected keyword argument 'dimensions'

3 comments

rrichmond7284

llama_index/llama-index-legacy/llama_ind...

Hi Asad,

Thanks for asking this question.

From a glance at your code, yes, you should be able to conduct the vector search query via using the MongoDB vector search aggregate pipeline with LlamaIndex:

https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/vector_stores/mongodb.py#L160-L183

Please ensure you have the vector search index definition on your cloud atlas databse collection.

Happy to answer any follow up questions you have

3 comments

rrichmond7284

Wondering if I can get some assistance

Wondering if I can get some assistance here.

I have the code below for an ingestion pipeline into a mongodb vector database, but getting some pydantic validation errors.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
vector_store = MongoDBAtlasVectorSearch(mongo_client, db_name=DB_NAME, collection_name=COLLECTION_NAME, index_name="vector_index")

pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=500, chunk_overlap=0),
OpenAIEmbedding(model="text-embedding-3-small", dimensions=256),
],
vector_store=vector_store
)

Ingest directly into a vector db

pipeline.run(documents=llama_documents)

index = VectorStoreIndex.from_vector_store(vector_store)

Error:

---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-12-6edc00a10dce> in <cell line: 11>()
9 print(type(vector_store))
10
---> 11 pipeline = IngestionPipeline(
12 transformations=[
13 SentenceSplitter(chunk_size=500, chunk_overlap=0),

1 frames
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in init(pydantic_self, **data)
339 values, fields_set, validation_error = validate_model(pydantic_self.class, data)
340 if validation_error:
--> 341 raise validation_error
342 try:
343 object_setattr(pydantic_self, 'dict', values)

ValidationError: 1 validation error for IngestionPipeline
vector_store
value is not a valid dict (type=type_error.dict)

12 comments