Hello everyone,
I'm encountering some problems with storing metadata in Neo4j using the
VectorStoreIndex
. I'm creating nodes with important metadata. Here is the relevant part of my code:
def get_metadata(filename):
for item in metadata:
if item["url"] == filename:
return item
return {}
documents = SimpleDirectoryReader(dir_documents, file_metadata=get_metadata).load_data()
node_parser = SentenceWindowNodeParser.from_defaults(window_size=3,window_metadata_key="window",original_text_metadata_key="original_text")
nodes = node_parser.get_nodes_from_documents(documents)
transformations = [title_extractor, qa_extractor]
neo4j_vector_store = Neo4jVectorStore(neo_username, neo_password, neo_url, embed_dim, hybrid_search=True, index_name=index_name)
storage_context = StorageContext.from_defaults(vector_store=neo4j_vector_store)
vector_index = get_vector_index(neo_username, neo_password, neo_url, embed_dim, index_name)
vector_index.insert_nodes(nodes, transformations=transformations, storage_context=storage_context)
I then built a query engine used by an agent. While the agent retrieves these nodes, the nodes do not have the metadata I imported previously. Upon checking the Neo4j created nodes, I noticed that not only do the nodes lack this metadata, but the metadata has also been vectorized as part of the content. Therefore, I don't have access to that metadata when retrieving normal chunks.
Here is the code for the query engine:
node_postprocessors = [MetadataReplacementPostProcessor(target_metadata_key="window"),SimilarityPostprocessor(similarity_cutoff=0.5)]
index_query_engine = index.as_query_engine(similarity_top_k=doc_similarity_top_k, node_postprocessors=node_postprocessors)
Is this an error? Is it a problem with using
SentenceWindowNodeParser
or how i include the metadata? What can I do to ensure that the metadata is stored correctly and can be retrieved as expected?
Any help or guidance would be greatly appreciated.
Thank you!