Find answers from the community

Home
Members
gabbyjai
g
gabbyjai
Offline, last seen 3 months ago
Joined September 25, 2024
Hi, I am currently playing around the SQLAutoVectorQueryEngine, but facing some difficulty. Specifically, I always run into this error message: ValueError: Metadata filters not implemented for SimpleVectorStore yet..

After studying the source code, I believe the root cause comes from the VectorIndexAutoRetriever. When the retriever creates the structured schema of the user's query, the LLM returns an empty array for the filters if there is no filter needed, according to the prompt. This empty array filter later transformed to an empty dictionary and used as MetadataFilters for index retriever. It seems like an empty dictionary is not allowed for SimpleVectorStore. The example from the documentation uses Pinecone vector store, so it may work (I don't have Pinecone so cannot replicate it).

My suggestion is when creating the index retriever, we only assign filters if is not empty, for instance:

filters = MetadataFilters(filters=query_spec.filters) if query_spec.filters else None retriever = VectorIndexRetriever( self._index, filters=filters, similarity_top_k=similarity_top_k, )

Please let me know what you think, thanks
1 comment
L
Hi, I am having a problem with using lancedb as vectore store. My documents include metadata, but the metadata is lost after I construct the vectorestore index. When I persist the storage context, the docstore.json is empty. This problem does not exist when I use simple vector store or FAISS vector store. Could you please let me know what am I doing wrong from the following codes?
How I create the document:
Plain Text
doc = Document(
    text=record['text'], 
    extra_info= {k: v for k, v in record.items() if k in extra_info_fields}
    )


How I create the index:
Plain Text
vector_store = LanceDBVectorStore(uri="lancedb_storage")
lancedb_storage_context = StorageContext.from_defaults(vector_store=vector_store)

lancedb_vector_retriever_index = VectorStoreIndex(
     input_documents,
     storage_context=lancedb_storage_context
)


How I persist the context:
Plain Text
lancedb_storage_context.persist(persist_dir="lancedb_storage")


Sample retrieved node:
Plain Text
NodeWithScore(node=TextNode(id_='3660f5fe-fca9-4244-9b04-d64685aa796f', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='None', node_type=None, metadata={}, hash=None)}, hash='b60d4828f419a879a0ec2619aaf35fa5b677a1c5b67c6b59bc28123ddf04641f', text="...", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.3688541352748871)
2 comments
g
L
Hi I have a question about the composability. Most of the examples in your documentation only shows two level of indices (i.e. a graph index on top of many document indices). In theory, we could expand it to more than 2 levels, so I'd like to ask if that's feasible, and is there any example / reference that I could take a look.

For more context, the current problem I am working on may require me to have a top level index on top of multiple graph indices, and each graph index may consist of many document indices.
6 comments
L
g