Find answers from the community

Updated last year

If I insert the same document into a

At a glance
If I insert the same document into a pinecone db using llama index, should I expect it to update the existing one or create duplicate? Currently it's creating a duplicate, which is undesirable.
L
O
10 comments
Yea currently, it's expected that it might duplicate, especially with pinecone

No vector db is providing an easy way to manage this right now πŸ˜… And we are still working on it.

For now, you can use the doc_id of the original document to delete it from the index

Then you can re-insert the new document

This of course assumes you set the doc_id of input documents to something consistant πŸ˜…

The doc_id is also available on the source nodes from responses, response.source_nodes[0].node.ref_doc_id
Since the doc ideas are {id}_part_X, I decided to try storing just the id as metadata and then deleting the vectors using pinecone directly.
I tried something like this:
Plain Text
                pinecone_index.delete(
                    filter={
                        "file_id": {"$eq": file_id},
                    }
                )

                # Get document from file path
                docs = SimpleDirectoryReader(input_files=[file_path]).load_data()

                # customize document doc_id and metadata
                logging.info("Adding doc_id to documents")
                for i in range(len(docs)):
                    # Add azure path to documents
                    docs[i].doc_id = f"{file_id}_part_{i}"

                    # Add azure path to metadata
                    docs[i].metadata["azure_path"] = azure_paths[x]
                    docs[i].metadata["file_id"] = file_id

                    # Exclude azure path from embedding
                    docs[i].excluded_embed_metadata_keys = ["azure_path", "file_id"]

                documents.extend(docs)

but the vectors aren't being deleted. Does metadata get passed up to the pinecone vectors?
It does πŸ€” That's how the current delete function is coded too
It might help to look at the source code? Here's the file for everything pinecone related https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/pinecone.py
Weird. I don't know why this didn't work then
The only thing I can think of is the file_id values aren't what you expected? πŸ€”
I'm also not a pinecone expert though, so there might be something I'm missing lol
Figured it out! I had put namespace in the pinecone_index init, but it should be on the delete function.
Add a reply
Sign up and join the conversation on Discord