Find answers from the community

R
Rami
Offline, last seen 3 months ago
Joined September 25, 2024
R
Rami
·

Update

Hi everyone, thanks for the great work with llama_index, I just wanted to point that the vector stores mongodb and deeplake might need updating to the BasePydanticVectorStore. Like the recent update of AstraDB. Sorry in advance, if this is not the right place to post, i am new here..

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/mongodb.py#L35

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/deeplake.py#L30

https://github.com/run-llama/llama_index/blob/61011d7721c5c95b15abfb840630be4b98a9beb5/llama_index/vector_stores/astra.py#L39
2 comments
R
L
i am trying to create ingestion pipeline, with mongo db atlas, and I am getting this error, any help would be apperciated:
Plain Text
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.ingestion import IngestionPipeline, DocstoreStrategy
from llama_index.storage.docstore import MongoDocumentStore
from llama_index.storage.index_store import MongoIndexStore

mongo_vector_store = MongoDBAtlasVectorSearch(
    mongodb_client=mongodb_client,
    db_name=f"{collection_name}_db",
    collection_name=collection_name,
    index_name="llm_experiments_index"
)
mongo_doc_store = MongoDocumentStore.from_uri(
    uri=mongo_uri,
    db_name=f"{collection_name}_db"
)

ingestion_pipeline = IngestionPipeline(
            transformations=[embed_model],
            docstore=mongo_doc_store,
            docstore_strategy=DocstoreStrategy.UPSERTS,
            vector_store=mongo_vector_store, <--- "Intellisense below error"
        )


error:
Plain Text
ValidationError: 1 validation error for IngestionPipeline
vector_store
  value is not a valid dict (type=type_error.dict)


Intellisense from pycharm:
Plain Text
Expected type 'Optional[BasePydanticVectorStore]', got 'MongoDBAtlasVectorSearch' instead
2 comments
R
L
when using posgres pg vector does llama index use hnsw by default or something else? - > when running an ingestion pipeline with pgvector as vector store. Thanks
Example:
Plain Text
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 10, ef_construction = 64);
3 comments
L
hi everyone, I have a question on index.refresh_ref_docs: does this not work for de-duplication when there is a vector store like deeplake or mongo?
https://github.com/run-llama/llama_index/blob/cc739d10069a7f2ac653d6d019fbeb18a891fea2/llama_index/indices/base.py#L310

Edit: => Searching through past messages i found this message. is this still the recommended way to deal with duplicates?

https://discord.com/channels/1059199217496772688/1163880111074971790/1163900056718553169

=====
53 comments
R
L
Hi everyone, thanks for the great product. I have a question regarding documents, collections, and multi-tenancy. I am a bit confused about the boundaries of these when it comes to querying data.

When I send a query, I understand that my boundary is a collection, and I can filter through documents in this collection using metadata. Can I send a query that spans multiple collections?

I read the blog post regarding multi-tenancy that was just posted on twitter by llama index and it suggests using metadata for multi-tenancy, is that the recommended approach to have one collection lets say for one app and it indexes all the information and topics and we do filtering using metadata?
2 comments
L