How are you checking if milvus is updated?
Looking at the code, insert
directely calls add()
to the vector store π
it looks like milvus actually stores the text, so you don't really need mongo π€
I think all you need is the vector store
You can try turning on debug logs to see if the insert is working too for milvus
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
That is what I thought. However, if there are no documents yet, it doesnt allow me to load the index (it assumes there is at least one index). I resorted to separate things with Mongo simply because things were not working.
Now I am explicitely creating an index if it doesnt exist and I can load it back. The problem is that I cannot seem to be able to add documents to it once it is created with from_documents.
Here I check Milvus after the inserts are done and no new documents are there
Does the exception get raised on the index_struct delete or the vector_store delete? π
But just to confirm, so something like this didnt work? (just looking at docs and source code here haha)
vector_store = MilvusVectorStore(
host='localhost',
port=19530,
overwrite='False'
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex([], storage_context=storage_context)
index.insert(Document("my doc", doc_id="blah"))
index.delete("blah")
The exception is not raised. It seems if you attempt to delete a document that is not there llama doesnt complain and continues
The difference in the docs is that you set overwrite='True' which will essentially delete the Milvus collection ad replace it with the new documents.
this is something i cannot do since the collection is being built with time
Looks like the delete on the vector store raises an error
So if it didn't raise an error on delete, I think it worked fine
if I create a GPTVectorStorageIndex and then try to reload it as such : load_index_from_storage(self.storage_context) I get the following error: ValueError: No index in storage context, check if you specified the right persist_dir. So I need to provide the index_id as well which is alien to me at this point as it is all being handled by Llama.
Now, if I try to load the GPTVectorStorageIndex with: GPTVectorStoreIndex([], storage_context=self.storage_context) I get the following error:
IndexError: list index out of range
of course this is given that
storage_context = StorageContext.from_defaults(
# docstore=self.__get_document_store(),
# index_store=self.__get_index_store(),
vector_store=self.__get_vector_store()
)
`
right, when using a 3rd party vector store, you don't actually need to load the index
Just set the vector store and load it as "empty"
I know the docs need to do a better job of covering this
I think you need to also set the persist dir in the storage context when loading
storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir=persist_dir)
index = load_index_from_storage(storage_context, service_context = service_context)
that is another issue :). the persist dir is a local directory which lead me to use the Mongo as an index_store. using mongo as index store I can use load_index_from_storage and I think it works (at least no errors are raised). The remaining issue would be to add to the Vector Store DB documents. If overwrite = False it seems the vector store db is not taking the new added documents. Or better said, the in-memory index is adding them but they are not being persisted to the DB. Trying to do index.vector_store.persist(...) wont do the trick because it does ask for a directory
Hmm this is starting to go over my head haha. I think I need to take a second to setup and try milvus myself later today π
Many thanks for your help. If you want I can pass you the script and the docker file this way it is very easy to try stuff
I'm back, figured it out!
Had to make one small bug fix to allow creating an empty milvus index
Then, I used this to connect to milvus
import pymilvus
from langchain.chat_models import ChatOpenAI
from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext, LLMPredictor, Document
from llama_index.vector_stores import MilvusVectorStore
llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.0))
sc = ServiceContext.from_defaults(llm_predictor=llm_predictor)
vector_store = MilvusVectorStore(
overwrite=False,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex([], storage_context=storage_context)
Then I used index.insert()
to insert two documents about swmming and golfing
I closed the script, and then reloaded everything with the same code above, and then queried about the two documents I inserted
Checking response.source_nodes, the documents where there πͺ
I'll make a PR with this minor change
were you able to add a third document after loafing?
perfect! hope this gets merged soon π
Merged! Should be in the next release (or you can install directly from source too)