Thread

37 comments

LLogan M

How are you checking if milvus is updated?

Looking at the code, insert directely calls add() to the vector store 👀

LLogan M

it looks like milvus actually stores the text, so you don't really need mongo 🤔

LLogan M

I think all you need is the vector store

LLogan M

You can try turning on debug logs to see if the insert is working too for milvus

Plain Text

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

nnablaux@0000

That is what I thought. However, if there are no documents yet, it doesnt allow me to load the index (it assumes there is at least one index). I resorted to separate things with Mongo simply because things were not working.

nnablaux@0000

Now I am explicitely creating an index if it doesnt exist and I can load it back. The problem is that I cannot seem to be able to add documents to it once it is created with from_documents.

nnablaux@0000

Here I check Milvus after the inserts are done and no new documents are there

Attachment

LLogan M

Does the exception get raised on the index_struct delete or the vector_store delete? 👀

But just to confirm, so something like this didnt work? (just looking at docs and source code here haha)

Plain Text

vector_store = MilvusVectorStore(
    host='localhost',
    port=19530,
    overwrite='False'
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex([], storage_context=storage_context)
index.insert(Document("my doc", doc_id="blah"))
index.delete("blah")

nnablaux@0000

The exception is not raised. It seems if you attempt to delete a document that is not there llama doesnt complain and continues

nnablaux@0000

The difference in the docs is that you set overwrite='True' which will essentially delete the Milvus collection ad replace it with the new documents.

nnablaux@0000

this is something i cannot do since the collection is being built with time

LLogan M

Looks like the delete on the vector store raises an error

Attachment

LLogan M

So if it didn't raise an error on delete, I think it worked fine

nnablaux@0000

if I create a GPTVectorStorageIndex and then try to reload it as such : load_index_from_storage(self.storage_context) I get the following error: ValueError: No index in storage context, check if you specified the right persist_dir. So I need to provide the index_id as well which is alien to me at this point as it is all being handled by Llama.
Now, if I try to load the GPTVectorStorageIndex with: GPTVectorStoreIndex([], storage_context=self.storage_context) I get the following error:
IndexError: list index out of range

nnablaux@0000

of course this is given that

Plain Text

storage_context = StorageContext.from_defaults(
            # docstore=self.__get_document_store(),
            # index_store=self.__get_index_store(),
            vector_store=self.__get_vector_store()
        )

LLogan M

right, when using a 3rd party vector store, you don't actually need to load the index

LLogan M

Just set the vector store and load it as "empty"

LLogan M

I know the docs need to do a better job of covering this

LLogan M

but also

LLogan M

if you do call persist

LLogan M

I think you need to also set the persist dir in the storage context when loading

LLogan M

Plain Text

storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir=persist_dir)
index = load_index_from_storage(storage_context, service_context = service_context)

nnablaux@0000

that is another issue :). the persist dir is a local directory which lead me to use the Mongo as an index_store. using mongo as index store I can use load_index_from_storage and I think it works (at least no errors are raised). The remaining issue would be to add to the Vector Store DB documents. If overwrite = False it seems the vector store db is not taking the new added documents. Or better said, the in-memory index is adding them but they are not being persisted to the DB. Trying to do index.vector_store.persist(...) wont do the trick because it does ask for a directory

LLogan M

Hmm this is starting to go over my head haha. I think I need to take a second to setup and try milvus myself later today 😅

nnablaux@0000

Many thanks for your help. If you want I can pass you the script and the docker file this way it is very easy to try stuff

LLogan M

I'm back, figured it out!

LLogan M

Had to make one small bug fix to allow creating an empty milvus index

LLogan M

Then, I used this to connect to milvus

Plain Text

import pymilvus

from langchain.chat_models import ChatOpenAI
from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext, LLMPredictor, Document
from llama_index.vector_stores import MilvusVectorStore

llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.0))
sc = ServiceContext.from_defaults(llm_predictor=llm_predictor)

vector_store = MilvusVectorStore(
    overwrite=False,
)


storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = GPTVectorStoreIndex([], storage_context=storage_context)

LLogan M

Then I used index.insert() to insert two documents about swmming and golfing

LLogan M

I closed the script, and then reloaded everything with the same code above, and then queried about the two documents I inserted

Checking response.source_nodes, the documents where there 💪

LLogan M

I'll make a PR with this minor change

LLogan M

https://github.com/jerryjliu/llama_index/pull/3253

nnablaux@0000

ah great! 🙏

nnablaux@0000

were you able to add a third document after loafing?

LLogan M

Yup 💪

nnablaux@0000

perfect! hope this gets merged soon 🙂

LLogan M

Merged! Should be in the next release (or you can install directly from source too)

Find answers from the community

Thread