gabbyjai

Hi I am currently playing around the

Hi, I am currently playing around the SQLAutoVectorQueryEngine, but facing some difficulty. Specifically, I always run into this error message: ValueError: Metadata filters not implemented for SimpleVectorStore yet..

After studying the source code, I believe the root cause comes from the VectorIndexAutoRetriever. When the retriever creates the structured schema of the user's query, the LLM returns an empty array for the filters if there is no filter needed, according to the prompt. This empty array filter later transformed to an empty dictionary and used as MetadataFilters for index retriever. It seems like an empty dictionary is not allowed for SimpleVectorStore. The example from the documentation uses Pinecone vector store, so it may work (I don't have Pinecone so cannot replicate it).

My suggestion is when creating the index retriever, we only assign filters if is not empty, for instance:

filters = MetadataFilters(filters=query_spec.filters) if query_spec.filters else None
retriever = VectorIndexRetriever(
            self._index,
            filters=filters,
            similarity_top_k=similarity_top_k,
        )

Please let me know what you think, thanks

1 comment

ggabbyjai

Hi I am having a problem with using

Hi, I am having a problem with using lancedb as vectore store. My documents include metadata, but the metadata is lost after I construct the vectorestore index. When I persist the storage context, the docstore.json is empty. This problem does not exist when I use simple vector store or FAISS vector store. Could you please let me know what am I doing wrong from the following codes?
How I create the document:

Plain Text

doc = Document(
    text=record['text'], 
    extra_info= {k: v for k, v in record.items() if k in extra_info_fields}
    )

How I create the index:

Plain Text

vector_store = LanceDBVectorStore(uri="lancedb_storage")
lancedb_storage_context = StorageContext.from_defaults(vector_store=vector_store)

lancedb_vector_retriever_index = VectorStoreIndex(
     input_documents,
     storage_context=lancedb_storage_context
)

How I persist the context:

Plain Text

lancedb_storage_context.persist(persist_dir="lancedb_storage")

Sample retrieved node:

Plain Text

NodeWithScore(node=TextNode(id_='3660f5fe-fca9-4244-9b04-d64685aa796f', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='None', node_type=None, metadata={}, hash=None)}, hash='b60d4828f419a879a0ec2619aaf35fa5b677a1c5b67c6b59bc28123ddf04641f', text="...", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.3688541352748871)

2 comments

ggabbyjai

Hi I have a question about the

Hi I have a question about the composability. Most of the examples in your documentation only shows two level of indices (i.e. a graph index on top of many document indices). In theory, we could expand it to more than 2 levels, so I'd like to ask if that's feasible, and is there any example / reference that I could take a look.

For more context, the current problem I am working on may require me to have a top level index on top of multiple graph indices, and each graph index may consist of many document indices.

6 comments

Find answers from the community

Hi I am currently playing around the

Hi I am having a problem with using

Hi I have a question about the