Find answers from the community

Updated last year

Hey! Anyone using Lancedb as a

At a glance
Hey! Anyone using Lancedb as a vectorstore? I struggle with keeping the metadata of my documents. I can find the in the lancedb table, as separate columns, but when querying the db with Llama index the metadata fields are missing. I saw other had that problem before, but I couldn't find the solution.
L
D
6 comments
seems like a missing feature -- the query function for LanceDBVectorStore does not use the metadata when fetching the top k nodes :PSadge:
Ohh, thank you for the fast answer!
It shouuuuld be a somewhat easy fix -- but not entirely sure (I've never used lancedb)
I am trying to monkey patch it. hope i can get it to work
Thanks for the help, it works now. I could work out a very basic solution, this will do it until someone with better skills comes along. For those in need, here is the code.
  1. Define new query function, and customise the metadata for your needs.
def modified_query(
self,
query: VectorStoreQuery,
**kwargs: Any,
) -> VectorStoreQueryResult:
"""Query index for top k most similar nodes."""
if query.filters is not None:
if "where" in kwargs:
raise ValueError(
"Cannot specify filter via both query and kwargs. "
"Use kwargs only for lancedb specific items that are "
"not supported via the generic query interface."
)
where = _to_lance_filter(query.filters)
else:
where = kwargs.pop("where", None)

table = self.connection.open_table(self.table_name)
lance_query = (
table.search(query.query_embedding)
.limit(query.similarity_top_k)
.where(where)
.nprobes(self.nprobes)
)

if self.refine_factor is not None:
lance_query.refine_factor(self.refine_factor)

results = lance_query.todf() nodes = [] for , item in results.iterrows():
node = TextNode(
text=item.text,
id_=item.id,
relationships={
NodeRelationship.SOURCE: RelatedNodeInfo(node_id=item.doc_id),
},
metadata={"file_path" : item.file_path} #CUSTOMIZE THIS BASED ON YOUR PREFERENCE
)
nodes.append(node)

return VectorStoreQueryResult(
nodes=nodes,
similarities=_to_llama_similarities(results),
ids=results["id"].tolist()
)
  1. Monkey patch: LanceDBVectorStore.query = modified_query
Add a reply
Sign up and join the conversation on Discord