LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Logan M A dumb question What happens

Logan M A dumb question What happens

At a glance

The community member is asking about the behavior of recreating a vector index with an external vector store like Weaviate. They are concerned that it may add new objects every time the index is recreated, even if the underlying document is the same. They would like more control over the class assignment for the objects being added.

The comments suggest that recreating the index (using from_documents) will indeed upload the data again. However, most vector indexes have a namespace or collection name attribute, which can be used to store multiple different indexes. The community members discuss ways to handle updates, such as deleting and re-inserting documents, and the challenges faced with vector databases in supporting such workflows.

The community members also discuss loading the stored index from Weaviate, and there are some issues with the load_index_from_storage function. The solution is to use VectorStoreIndex.from_vector_store instead, which does not require loading the index from storage.

Finally, the community members discuss the desire to save the index in both the vector store and file storage, without creating the index twice. However, there does not seem to be a clear solution provided in the comments.

·

A dumb question - What happens when we recreate a vector index with the extenal vector store like weaviate . Does it add new objects everytime we recreate the index (even if the underlying document is the same ) . I see it is definining classes with node id (Gpt_Index_5128082748824505963_Node) . So I think it keeps on adding objects under new classes every time we recreate an index ? I would like more control over it - can I specify the class for the object I am going to add ?

L

a

42 comments

If you recreate the index (using from_documents), yea it will upload it again.

I'd also note though that most vector indexes have some kind of namespace or collection name attribute, so that you can easily store multiple different indexes i.e. using weaviate

Why are you re-creating the index though? Once it's created once, you can connect to it again by setting up the same vector store object and using index = VectorStoreIndex.from_vector_store(vector_store)

if user uploads the updated document then expectation would be to have the index updated

trying to think through how do I handle update flow

I would need to provide multitenancy to keep the data seperate for multiple client

A lot of vector dbs have terrible support for this kind of workflow. You can Insert new user docs, but detecting duplicates and/or updatig is still super hard

Working on making this better with llama index! But the limiting factor is these vector dbs

If you set the doc id if each input document

You can call vector_store.delete(doc_id) and then insert again

Assuming you kept track of the doc ids

😅

thanks. I dont think I can search for objects without class name even if I set a doc_id as property and keep track of it , but let me see . Weaviate create returns uuid for the newly created object. I wish vector store could return list of uuids .

by the way weaviate class takes in class_name_prefix and then append "Node" to it

def init(
self,
weaviate_client: Optional[Any] = None,
class_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:

I can possibly use that but if you guys ever decide to change the structure the code would break...I will be using non-api stuff in my code 🙂

Thats fair. Althouygh heads up the current vector_store.delete() function deletes by looking at doc_ids (which are not modified), not node ids

cool thanks. Stay around on this thread for couple of days few of my dumb/dumber questions coming your way while I explore this

I'm always around :dotsCATJAM:

@Logan M good afternoon - quick question , is it possible to load stored index from weaviate ?

This fails def load_indx():
class_prefix = 'Test123'
# construct vector store
vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = load_index_from_storage(storage_context)
return index

ValueError Traceback (most recent call last)
Cell In[38], line 1
----> 1 ret_indx = load_indx()

Cell In[37], line 7, in load_indx()
5 vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
6 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 7 index = load_index_from_storage(storage_context)
8 return index

File ~/opt/anaconda3/envs/LLMTools/lib/python3.9/site-packages/llama_index/indices/loading.py:36, in load_index_from_storage(storage_context, index_id, kwargs) 33 indices = load_indices_from_storage(storage_context, index_ids=index_ids, kwargs)
35 if len(indices) == 0:
---> 36 raise ValueError(
37 "No index in storage context, check if you specified the right persist_dir."
38 )
39 elif len(indices) > 1:
40 raise ValueError(
41 f"Expected to load a single index, but got {len(indices)} instead. "
42 "Please specify index_id."
43 )

ValueError: No index in storage context, check if you specified the right persist_dir.

looks like it looks for index_structs and dont find any

Are we supposed to directly query the vector store ..without loading it into memory ?

you don't need to use load_index_from_storage with vector db integrations

You can setup the vector store, then do index = VectorStoreIndex.from_vector_store(vector_store)

great that can work ..In just need a way to pass in the class_prefix for now 🙂 I see it takes in GraphQL as well

I get this error

Cell In[47], line 7, in load_indx()
5 vector_store = WeaviateVectorStore(weaviate_client=client,class_prefix=class_prefix)
6 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 7 index = GPTVectorStoreIndex.from_vector_store(vector_store)
8 return index

AttributeError: type object 'GPTVectorStoreIndex' has no attribute 'from_vector_store'

What version of llama index do you have?

If you don't want to upgrade, the alternative is setting up the vector store and storage context, and initializing with an empty array

Plain Text

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex([], storage_context=storage_context)

I will upgrade but docs dont show this API

Attachment

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex([], storage_context=storage_context) worked , thanks

but from_vector_store is better ofcourse

@Logan M another rather simple question - This will save the index in underlying storage just fine .

GPTVectorStoreIndex.from_documents(documents=documents,service_context=service_context,storage_context = storage_context);

But how can I save an index in a sepreate operation. I wanted to save the same index in vector store and file storage . I do not want create the index twice

So in one operation I just want to create index and in another I want to save it

When you are using a vector store db, like weaviate, it does not store to disk. Everything is stored in the vectordb automtatically for you

Actually, I think I misunderstood your question. I'm not sure what you are asking 🤔

Currently - If the storage context is backed by vector store , create index operation will create the index and store it in weaviate db , this is fine in most cases. But I guess create does two things - creation and storage. I was wondering if there is a way to create an index ( without any storage context) and then store the index in weaviate db seperately and then save the same index in disk . so basically two index.storage_context.persist() calls - one for weviate db and another for disk

since index does not let you set storage context after its creation , it becomes tricky. I can create the index with disk and vector store based storage context seperatly but i think that is not as efficient .

Hmmm, interesting use case. Is there a reason you want to save in both weaviate and to disk?

yesh ..I have been using disk storage so wanted to keep that flow intact while trying weaviate out

Add a reply

Sign up and join the conversation on Discord