LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

Insert

Insert

At a glance

·

Hi all, I'm using LlamaIndex with Weaviate. Is there any way to add documents to an existing index incrementally? Or must the entire index be rebuilt every time? I'm trying to allow people to add data to an existing index, but it just returns "store is read-only" and from what I can tell, LlamaIndex doesn't support updating. Is this the case?

L

I

23 comments

Hmm, you should be able to do index.insert(document)

Hi Logan! I just tried doing index.insert(documents) and I get an error

I'm giving it a list of documents the same way as I originally created it with. Maybe I need to try it with a single document

need to do one at a time (sadly)

Plain Text

for document in documents:
  index.insert(document)

I just tried it with a single document, and no error, but it doesn't seem to work. The metadata in the document isn't appearing in weaviate.

heh better send that as a py file instead

I'm not setting the service context, maybe that's it?

In the original code where I create the index I use this:

# Create the parser and nodes
parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(documents)

# construct vector store
vector_store = WeaviateVectorStore(weaviate_client = client, index_name="Pages", text_key="text")
# setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store = vector_store)
# set up the index
index = VectorStoreIndex(nodes, storage_context=storage_context)

How do you know it's not appearing in weaviate? There's no difference really between calling insert(document) and your code sample

I'm querying weaviate for the metadata. When I add the document, they have a url and a timesstamp (the pages are scraped and timestamped)

I query weaviate:

{{
Get {{
Pages(
where: {{
operator: Equal
path: ["websiteAddress"]
valueString: "{website_address}"
}}
) {{
timestamp
}}
}}
}}

I'm trying to insert another document that has the same website, and a different timestamp, and when I call that it only returns the timestamp that was already in there

It works fine when the index is originally constructed. If I have multiple timestamps, they're all returned by that query

Am I setting up everything correctly for an existing weaviate store?

that seems fine to me 🤔 Like I mentioned, insert() calls all the same functions as the normal constructor. So not really sure what the difference here is that could be causing it

You could also try repeating the initial constructor, but with new nodes. It will just append if the vector store already exists

It's ... sort of working 🙂 I'm going to take another look. One thing I found -- there's a insert_nodes function in https://github.com/run-llama/llama_index/blob/main/llama_index/indices/vector_store/base.py as well

Does that work? Would it be faster to create the nodes and insert them that way?

Rather than iterating over all documents

I mean, you'd end up doing all the same processing, so speed would be the same. But up to you if you want more control over how documents are chunked into nodes

Gotcha - just wasn't sure if it was any faster to process all the documents at once and then insert them, rather than do each one at a time. It sounds like they're equivalent though

Thanks for the help!

Add a reply

Sign up and join the conversation on Discord