Find answers from the community

Updated last year

Hi I am having a problem with using

Hi, I am having a problem with using lancedb as vectore store. My documents include metadata, but the metadata is lost after I construct the vectorestore index. When I persist the storage context, the docstore.json is empty. This problem does not exist when I use simple vector store or FAISS vector store. Could you please let me know what am I doing wrong from the following codes?
How I create the document:
Plain Text
doc = Document(
    text=record['text'], 
    extra_info= {k: v for k, v in record.items() if k in extra_info_fields}
    )


How I create the index:
Plain Text
vector_store = LanceDBVectorStore(uri="lancedb_storage")
lancedb_storage_context = StorageContext.from_defaults(vector_store=vector_store)

lancedb_vector_retriever_index = VectorStoreIndex(
     input_documents,
     storage_context=lancedb_storage_context
)


How I persist the context:
Plain Text
lancedb_storage_context.persist(persist_dir="lancedb_storage")


Sample retrieved node:
Plain Text
NodeWithScore(node=TextNode(id_='3660f5fe-fca9-4244-9b04-d64685aa796f', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='None', node_type=None, metadata={}, hash=None)}, hash='b60d4828f419a879a0ec2619aaf35fa5b677a1c5b67c6b59bc28123ddf04641f', text="...", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.3688541352748871)
L
g
2 comments
Hmmm this looks like a bug (the metadata being missing). LanceDB isn't heavily used, so I guess this got missed πŸ˜… I thiiiink can fix this.

In most vector db integrations, the entire index is stored in the db (hence the docstore is empty)
Thanks, a quick work-around from my end:
Plain Text
node_parser = SimpleNodeParser()
doc_nodes = node_parser.get_nodes_from_documents(input_documents)

lancedb_storage_context.docstore.add_documents(doc_nodes)

lancedb_vector_retriever_index = VectorStoreIndex(
    doc_nodes,
    storage_context=lancedb_storage_context
)

lancedb_storage_context.persist(persist_dir="lancedb_storage")

Manually adding the nodes to the docstore can help persist the metadata
Add a reply
Sign up and join the conversation on Discord