Duckdb vector store limitations

At a glance

The post discusses the DuckDB vector store, which only allows the use of the .delete() method and not .add() or .upsert(). Community members discuss how to add data to the vector store, noting that it requires a list of nodes rather than just document IDs. They provide examples of how to create and add nodes, and mention the need to embed the nodes before adding them. One community member also asks how to delete data when loading a persisting index file, as they don't have the document ID to use the .delete() method.

Useful resources

JJatin.K

Created DuckDB vector store(https://docs.llamaindex.ai/en/stable/examples/vector_stores/DuckDBDemo/) only letting using .delete and not other like .add or .upsert.

Attachment

28 comments

WWhiteFang_Jr

I dont think upsert is present for DuckDB but I can see add and delete in it.

How are you doing add or delete?

For add: https://github.com/run-llama/llama_index/blob/d5b7511a3c51937abf7b21402b826e28de58aabd/llama-index-integrations/vector_stores/llama-index-vector-stores-duckdb/llama_index/vector_stores/duckdb/base.py#L287C21-L287C40

LLogan M

yea, upsert() is not a method on any vector store

JJatin.K

Only able to do delete, it's not supporting add as well: vector_store.delete("doc_id")

JJatin.K

Why it's displayed on DuckDB LlamaIndex doc

Attachment

WWhiteFang_Jr

Can you share how you are adding,
It takes a list of nodes while adding

WWhiteFang_Jr

The code for adding is mentioned here

JJatin.K

vector_store.add("doc_id")

WWhiteFang_Jr

It takes a list of nodes

JJatin.K

Can you give example?

WWhiteFang_Jr

It would be something like this:

vector_store.add([node1,node2,node3])

JJatin.K

okay, and nodes will be doc_IDs?

WWhiteFang_Jr

No node is the object that you create from your docs

WWhiteFang_Jr

You can find more about nodes here: https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/#nodes

JJatin.K

okay, i am using only Document to build index index = VectorStoreIndex.from_documents(documents), shall i use nodes just to use .add because doc_ID is already serving me in using .delete

JJatin.K

Moreover just to confirm, node_id is to nodes what doc_ID is to documents?

JJatin.K

I am providing node1 but it's throwing following error:

Attachment

WWhiteFang_Jr

You are provide the ID, node is a object

WWhiteFang_Jr

Try doing this:

Plain Text

# once you get document object from SimpleDirectory, convert them into Node
# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

#Then pass these nodes into the `add` method.
vector_store.add(nodes)

WWhiteFang_Jr

if you wanna test it with single node try with this:

Plain Text

from llama_index.core.schema import TextNode
node1 = TextNode(text="<text_chunk>", id_="<node_id>")
node2 = TextNode(text="<text_chunk>", id_="<node_id>")

vector_store.add([node1,node2])

JJatin.K

Fixed that error but now I am getting embedding not set error

Attachment

JJatin.K

same error with your example

Attachment

LLogan M

You need to embed the nodes before adding them

LLogan M

Plain Text

embed_model = OpenAIEmbedding()

node_texts = [node.text for node in nodes]
embeddings = embed_model.get_text_embedding_batch(node_texts)
for (node, embedding) in zip(nodes, embeddings):
  node.embedding = embedding

vector_store.add(nodes)

JJatin.K

thanks, it's working now! One last thing before i go, if i use single file in SimpleDirectoryReader, it throws below error which goes away if uses commented direcotry approach

Attachment

WWhiteFang_Jr

You need to add .load_data() at the end of simpledirectoryreader

WWhiteFang_Jr

documents=SimpleDirectoryReader(..).load_data()

JJatin.K

thanks y'all, cheers.

JJatin.K

One last thing, how to delete data [Document/Node object], when I am loading a persisting index file like in below screenshot. In that case, i don't have a document_Id to do vector_store.delete("doc_ID"), which I am currently using after SimpleDirectoryReader().

Attachment

Add a reply

Find answers from the community

Duckdb vector store limitations