Find answers from the community

Updated 6 months ago

Hello πŸ‘‹

At a glance

The community member is looking for examples on how to append or delete nodes from an existing vector store or document store. The comments provide some suggestions, such as using index.insert(document), index.insert_nodes(nodes), and index.delete_ref_doc(ref_doc_id). However, the community member is struggling with creating the index object without passing the nodes parameter, and understanding the relationship between different components like VectorStoreIndex, StorageContext, and ServiceContext.

One community member suggests using VectorStoreIndex.from_vector_store(vector_store) to create the index without passing the nodes, and another suggests using auto_merging_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store) and automerging_retriever = AutoMergingRetriever(retriever, auto_merging_context, verbose=True) to handle the auto-merging retrieval. However, there is no explicitly marked answer in the comments.

Hello πŸ‘‹
I see a lot of tutorials about creating indexes and querying them, but I have not seen an example of appending/deleting nodes from existing vector store/document store. Could you point to such an example ? πŸ™
L
A
8 comments
index.insert(document)

index.insert_nodes(nodes)

index.delete_ref_doc(ref_doc_id)
thanks. But what is blocking me is the creation of the index object. right now I am creating it with a "node" parameter : VectorStoreIndex(nodes=nodes) which insert the nodes in the index. Now If I start a new session and want to update the index, how can I create it without node parameter ? (then call insert/delete)
I see it can take a IndexStruct as input but there are no example of this anywhere
@Logan M in short : how can I create a VectorStoreIndex without passing nodes to it
Are you using a vector db integration? Or the default?

If using the default
Plain Text
index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage
index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage"))


If using a db integration

Plain Text
index = VectorStoreIndex.from_vector_store(vector_store)
@Logan M Thank you so much for taking the time.
I still have trouble understanding the logic of vectorStore, DocStore, IndexStore, VectorStoreIndex, StorageContext and ServiceContext
I am struggling to find a clear explanation of the logic of all of those together and wether we could/should use all of them.
Here is my current code :

Plain Text
#### Build index
from llama_index.core import StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.storage.docstore.redis import RedisDocumentStore

docstore = RedisDocumentStore.from_host_and_port(host="127.0.0.1", port="6379", namespace="llama_index")
vector_store = MilvusVectorStore(url='http://localhost:19530', dim=384, overwrite=True, collection_name = "myCollection")
storage_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store)

# Chunk
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=[2048, 512, 128])
nodes = node_parser.get_nodes_from_documents(docs)

# NOTE : I'd rather add the nodes before embedding. There is no point in storing embeddings in the docstore
docstore.add_documents(nodes)

# Compute Embeddings
embedded_nodes = huggingface_embedding_model(nodes)

# Add embeddings to my vector store
vector_store.add(embedded_nodes)


Here I create my vector_store_index but it needs docstore to perform automerging retrieval so it fails on the query !

Plain Text
#### Inference
# VectorStoreIndex is created without docstore ???
vector_store_index = VectorStoreIndex.from_vector_store(vector_store, embed_model=huggingface_embedding_model)

retriever = vector_store_index.as_retriever(
    similarity_top_k=20,
    vector_store_query_mode="default"
)

automerging_retriever = AutoMergingRetriever(
    retriever, 
    vector_store_index.storage_context, 
    verbose=True,
)

rerank = SentenceTransformerRerank(top_n=5, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, 
    llm=llm.model,
    node_postprocessors=[rerank]
)

auto_merging_engine.query("what is the best model ?")


Maybe I am doing it the wrong way
The interface could probably be improved for the auto merging retriever, but you can do

Plain Text
auto_merging_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store)

automerging_retriever = AutoMergingRetriever(retriever, auto_merging_context, verbose=True) 
Add a reply
Sign up and join the conversation on Discord