Find answers from the community

Updated 4 months ago

Logan M A dumb question What happens

At a glance
A dumb question - What happens when we recreate a vector index with the extenal vector store like weaviate . Does it add new objects everytime we recreate the index (even if the underlying document is the same ) . I see it is definining classes with node id (Gpt_Index_5128082748824505963_Node) . So I think it keeps on adding objects under new classes every time we recreate an index ? I would like more control over it - can I specify the class for the object I am going to add ?
L
a
42 comments
If you recreate the index (using from_documents), yea it will upload it again.

I'd also note though that most vector indexes have some kind of namespace or collection name attribute, so that you can easily store multiple different indexes i.e. using weaviate

Why are you re-creating the index though? Once it's created once, you can connect to it again by setting up the same vector store object and using index = VectorStoreIndex.from_vector_store(vector_store)
if user uploads the updated document then expectation would be to have the index updated
trying to think through how do I handle update flow
I would need to provide multitenancy to keep the data seperate for multiple client
A lot of vector dbs have terrible support for this kind of workflow. You can Insert new user docs, but detecting duplicates and/or updatig is still super hard
Working on making this better with llama index! But the limiting factor is these vector dbs
If you set the doc id if each input document
You can call vector_store.delete(doc_id) and then insert again
Assuming you kept track of the doc ids
thanks. I dont think I can search for objects without class name even if I set a doc_id as property and keep track of it , but let me see . Weaviate create returns uuid for the newly created object. I wish vector store could return list of uuids .
by the way weaviate class takes in class_name_prefix and then append "Node" to it
def init(
self,
weaviate_client: Optional[Any] = None,
class_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
I can possibly use that but if you guys ever decide to change the structure the code would break...I will be using non-api stuff in my code πŸ™‚
Thats fair. Althouygh heads up the current vector_store.delete() function deletes by looking at doc_ids (which are not modified), not node ids
cool thanks. Stay around on this thread for couple of days few of my dumb/dumber questions coming your way while I explore this
I'm always around :dotsCATJAM:
@Logan M good afternoon - quick question , is it possible to load stored index from weaviate ?
This fails def load_indx():
class_prefix = 'Test123'
# construct vector store
vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = load_index_from_storage(storage_context)
return index
ValueError Traceback (most recent call last)
Cell In[38], line 1
----> 1 ret_indx = load_indx()

Cell In[37], line 7, in load_indx()
5 vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
6 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 7 index = load_index_from_storage(storage_context)
8 return index

File ~/opt/anaconda3/envs/LLMTools/lib/python3.9/site-packages/llama_index/indices/loading.py:36, in load_index_from_storage(storage_context, index_id, kwargs) 33 indices = load_indices_from_storage(storage_context, index_ids=index_ids, kwargs)
35 if len(indices) == 0:
---> 36 raise ValueError(
37 "No index in storage context, check if you specified the right persist_dir."
38 )
39 elif len(indices) > 1:
40 raise ValueError(
41 f"Expected to load a single index, but got {len(indices)} instead. "
42 "Please specify index_id."
43 )

ValueError: No index in storage context, check if you specified the right persist_dir.
looks like it looks for index_structs and dont find any
Are we supposed to directly query the vector store ..without loading it into memory ?
you don't need to use load_index_from_storage with vector db integrations
You can setup the vector store, then do index = VectorStoreIndex.from_vector_store(vector_store)
great that can work ..In just need a way to pass in the class_prefix for now πŸ™‚ I see it takes in GraphQL as well
I get this error
Cell In[47], line 7, in load_indx()
5 vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
6 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 7 index = GPTVectorStoreIndex.from_vector_store(vector_store)
8 return index

AttributeError: type object 'GPTVectorStoreIndex' has no attribute 'from_vector_store'
What version of llama index do you have?

If you don't want to upgrade, the alternative is setting up the vector store and storage context, and initializing with an empty array

Plain Text
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex([], storage_context=storage_context)
I will upgrade but docs dont show this API
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex([], storage_context=storage_context) worked , thanks
but from_vector_store is better ofcourse
@Logan M another rather simple question - This will save the index in underlying storage just fine .
GPTVectorStoreIndex.from_documents(documents=documents,service_context=service_context,storage_context = storage_context);
But how can I save an index in a sepreate operation. I wanted to save the same index in vector store and file storage . I do not want create the index twice
So in one operation I just want to create index and in another I want to save it
When you are using a vector store db, like weaviate, it does not store to disk. Everything is stored in the vectordb automtatically for you
Actually, I think I misunderstood your question. I'm not sure what you are asking πŸ€”
Currently - If the storage context is backed by vector store , create index operation will create the index and store it in weaviate db , this is fine in most cases. But I guess create does two things - creation and storage. I was wondering if there is a way to create an index ( without any storage context) and then store the index in weaviate db seperately and then save the same index in disk . so basically two index.storage_context.persist() calls - one for weviate db and another for disk
since index does not let you set storage context after its creation , it becomes tricky. I can create the index with disk and vector store based storage context seperatly but i think that is not as efficient .
Hmmm, interesting use case. Is there a reason you want to save in both weaviate and to disk?
yesh ..I have been using disk storage so wanted to keep that flow intact while trying weaviate out
Add a reply
Sign up and join the conversation on Discord