Rami

Update

Hi everyone, thanks for the great work with llama_index, I just wanted to point that the vector stores mongodb and deeplake might need updating to the BasePydanticVectorStore. Like the recent update of AstraDB. Sorry in advance, if this is not the right place to post, i am new here..

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/mongodb.py#L35

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/deeplake.py#L30

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/astra.py#L39

2 comments

RRami

i am trying to create ingestion pipeline

i am trying to create ingestion pipeline, with mongo db atlas, and I am getting this error, any help would be apperciated:

Plain Text

from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.storage.docstore import MongoDocumentStore
from llama_index.storage.index_store import MongoIndexStore

mongo_vector_store = MongoDBAtlasVectorSearch(
    mongodb_client=mongodb_client,
    db_name=f"{collection_name}_db",
    collection_name=collection_name,
    index_name="llm_experiments_index"
)
mongo_doc_store = MongoDocumentStore.from_uri(
    uri=mongo_uri,
    db_name=f"{collection_name}_db"
)

ingestion_pipeline = IngestionPipeline(
            transformations=[embed_model],
            docstore=mongo_doc_store,
            docstore_strategy=DocstoreStrategy.UPSERTS,
            vector_store=mongo_vector_store, <--- "Intellisense below error"
        )

error:

Plain Text

ValidationError: 1 validation error for IngestionPipeline
vector_store
  value is not a valid dict (type=type_error.dict)

Intellisense from pycharm:

Plain Text

Expected type 'Optional[BasePydanticVectorStore]', got 'MongoDBAtlasVectorSearch' instead

2 comments

RRami

when using posgres pg vector does llama

when using posgres pg vector does llama index use hnsw by default or something else? - > when running an ingestion pipeline with pgvector as vector store. Thanks
Example:

Plain Text

CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 10, ef_construction = 64);

3 comments

RRami

llama_index/llama_index/indices/base.py ...

hi everyone, I have a question on index.refresh_ref_docs: does this not work for de-duplication when there is a vector store like deeplake or mongo?
https://github.com/run-llama/llama_index/blob/cc739d10069a7f2ac653d6d019fbeb18a891fea2/llama_index/indices/base.py#L310

Edit: => Searching through past messages i found this message. is this still the recommended way to deal with duplicates?

https://discord.com/channels/1059199217496772688/1163880111074971790/1163900056718553169

=====

53 comments

RRami

Hi everyone, thanks for the great

Hi everyone, thanks for the great product. I have a question regarding documents, collections, and multi-tenancy. I am a bit confused about the boundaries of these when it comes to querying data.

When I send a query, I understand that my boundary is a collection, and I can filter through documents in this collection using metadata. Can I send a query that spans multiple collections?

I read the blog post regarding multi-tenancy that was just posted on twitter by llama index and it suggests using metadata for multi-tenancy, is that the recommended approach to have one collection lets say for one app and it indexes all the information and topics and we do filtering using metadata?

2 comments

Find answers from the community

Update

i am trying to create ingestion pipeline

when using posgres pg vector does llama

llama_index/llama_index/indices/base.py ...

Hi everyone, thanks for the great