Find answers from the community

Updated 3 months ago

Insert

Hi all, I'm using LlamaIndex with Weaviate. Is there any way to add documents to an existing index incrementally? Or must the entire index be rebuilt every time? I'm trying to allow people to add data to an existing index, but it just returns "store is read-only" and from what I can tell, LlamaIndex doesn't support updating. Is this the case?
L
I
23 comments
Hmm, you should be able to do index.insert(document)
Hi Logan! I just tried doing index.insert(documents) and I get an error
I'm giving it a list of documents the same way as I originally created it with. Maybe I need to try it with a single document
need to do one at a time (sadly)
Plain Text
for document in documents:
  index.insert(document)
I just tried it with a single document, and no error, but it doesn't seem to work. The metadata in the document isn't appearing in weaviate.
heh better send that as a py file instead
I'm not setting the service context, maybe that's it?
In the original code where I create the index I use this:

# Create the parser and nodes parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20) nodes = parser.get_nodes_from_documents(documents) # construct vector store vector_store = WeaviateVectorStore(weaviate_client = client, index_name="Pages", text_key="text") # setting up the storage for the embeddings storage_context = StorageContext.from_defaults(vector_store = vector_store) # set up the index index = VectorStoreIndex(nodes, storage_context=storage_context)
How do you know it's not appearing in weaviate? There's no difference really between calling insert(document) and your code sample
I'm querying weaviate for the metadata. When I add the document, they have a url and a timesstamp (the pages are scraped and timestamped)
I query weaviate:
{{
Get {{
Pages(
where: {{
operator: Equal
path: ["websiteAddress"]
valueString: "{website_address}"
}}
) {{
timestamp
}}
}}
}}
I'm trying to insert another document that has the same website, and a different timestamp, and when I call that it only returns the timestamp that was already in there
It works fine when the index is originally constructed. If I have multiple timestamps, they're all returned by that query
Am I setting up everything correctly for an existing weaviate store?
that seems fine to me πŸ€” Like I mentioned, insert() calls all the same functions as the normal constructor. So not really sure what the difference here is that could be causing it

You could also try repeating the initial constructor, but with new nodes. It will just append if the vector store already exists
It's ... sort of working πŸ™‚ I'm going to take another look. One thing I found -- there's a insert_nodes function in https://github.com/run-llama/llama_index/blob/main/llama_index/indices/vector_store/base.py as well
Does that work? Would it be faster to create the nodes and insert them that way?
Rather than iterating over all documents
I mean, you'd end up doing all the same processing, so speed would be the same. But up to you if you want more control over how documents are chunked into nodes
Gotcha - just wasn't sure if it was any faster to process all the documents at once and then insert them, rather than do each one at a time. It sounds like they're equivalent though
Thanks for the help!
Add a reply
Sign up and join the conversation on Discord