Hello 👋

At a glance

Hello 👋
I see a lot of tutorials about creating indexes and querying them, but I have not seen an example of appending/deleting nodes from existing vector store/document store. Could you point to such an example ? 🙏

8 comments

LLogan M

index.insert(document)

index.insert_nodes(nodes)

index.delete_ref_doc(ref_doc_id)

AArcy

thanks. But what is blocking me is the creation of the index object. right now I am creating it with a "node" parameter : VectorStoreIndex(nodes=nodes) which insert the nodes in the index. Now If I start a new session and want to update the index, how can I create it without node parameter ? (then call insert/delete)

AArcy

I see it can take a IndexStruct as input but there are no example of this anywhere

AArcy

@Logan M in short : how can I create a VectorStoreIndex without passing nodes to it

LLogan M

Are you using a vector db integration? Or the default?

If using the default

Plain Text

index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage
index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage"))

If using a db integration

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store)

AArcy

@Logan M Thank you so much for taking the time.
I still have trouble understanding the logic of vectorStore, DocStore, IndexStore, VectorStoreIndex, StorageContext and ServiceContext
I am struggling to find a clear explanation of the logic of all of those together and wether we could/should use all of them.

AArcy

Here is my current code :

Plain Text

#### Build index
from llama_index.core import StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.storage.docstore.redis import RedisDocumentStore

docstore = RedisDocumentStore.from_host_and_port(host="127.0.0.1", port="6379", namespace="llama_index")
vector_store = MilvusVectorStore(url='http://localhost:19530', dim=384, overwrite=True, collection_name = "myCollection")
storage_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store)

# Chunk
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=[2048, 512, 128])
nodes = node_parser.get_nodes_from_documents(docs)

# NOTE : I'd rather add the nodes before embedding. There is no point in storing embeddings in the docstore
docstore.add_documents(nodes)

# Compute Embeddings
embedded_nodes = huggingface_embedding_model(nodes)

# Add embeddings to my vector store
vector_store.add(embedded_nodes)

Here I create my vector_store_index but it needs docstore to perform automerging retrieval so it fails on the query !

Plain Text

#### Inference
# VectorStoreIndex is created without docstore ???
vector_store_index = VectorStoreIndex.from_vector_store(vector_store, embed_model=huggingface_embedding_model)

retriever = vector_store_index.as_retriever(
    similarity_top_k=20,
    vector_store_query_mode="default"
)

automerging_retriever = AutoMergingRetriever(
    retriever, 
    vector_store_index.storage_context, 
    verbose=True,
)

rerank = SentenceTransformerRerank(top_n=5, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, 
    llm=llm.model,
    node_postprocessors=[rerank]
)

auto_merging_engine.query("what is the best model ?")

Maybe I am doing it the wrong way

LLogan M

The interface could probably be improved for the auto merging retriever, but you can do

Plain Text

auto_merging_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store)

automerging_retriever = AutoMergingRetriever(retriever, auto_merging_context, verbose=True)

Add a reply

Find answers from the community

Hello 👋