Find answers from the community

Updated 2 months ago

Do I need to reload an index after every

Do I need to reload an index after every ingestion?
W
P
H
16 comments
No, but may need to create query_engine instace after insertion
I am saving the index object at the start and then doing the insertion, creating query engine instances everytime I am doing query. Will this work?
if you are inserting after saving the index then the latest insertion might not get saved,

"creating query engine instances everytime I am doing query. Will this work?" -
Yeah this part is fine
then what's the best way to do dynamic insertion ? Or is it the case that we need to save the index after every insertion
Not after every insertion, Once all the insertions has taken place, you can do persist part.

Persisting takes time to be completed
I am using a vector DB, and hence want to save the index instance in-memory for reduced latency using querying? And In my usecase, the user can insert to an index over time, so can't do the "save index after all insertions" :/.
Ah okay, So with Vector DB you dont have to reload! once it is inserted. you are ready to use it on the go
lol I should ask in the beginning if it is for local vector index or third party πŸ˜…
@WhiteFang_Jr
Suppose we have a third-party vector store such as Chroma or Qdrant. How can we add a new document to an existing collection in the store? Currently, I am building the index from the existing collection and then adding the new documents to the index. Is this the best approach, or is there a better way to accomplish this?
Plain Text
splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=128)
nodes = splitter(docs)
index.insert_nodes(nodes=nodes)
Yeah, this works for both, in memory vector store index and third party vector DBs.
I think we don't need to build the index everytime if we are using a vectorDB, we can save the index instance in memory and use in whenever required. @WhiteFang_Jr Please correct me if this is wrong
Yep, actually for third party VB, only connection is made while loading up the indexes. so there is no loss in there.
In case you are loading up index from local disk and it is going to load all the embeddings and nodes then it will increase the time for loading thus increasing time for every query response.

For third party both can be done. I prefer keeping it in memory.
When you are building a product and you have different query requests for different documents, I believe we need to load from the local disk for every request. Isn't it? Or is there any better way to manage this?
Not necessarily, if the system is like you are adding documents to the existing index, you dont need to load index everytime.

You create the index during starting of the server and now every new document can be inserted into that index and there is no need to reload the index.
It's true, but if we have a multi-user system where multiple users can ask questions about their documents concurrently, we may need to load the index for each user. Right?
Add a reply
Sign up and join the conversation on Discord