Find answers from the community

Updated 2 months ago

Storing Metadata in Neo4j Using VectorStoreIndex

At a glance
The community member is encountering issues with storing metadata in Neo4j using the VectorStoreIndex. They are creating nodes with important metadata, but when retrieving the nodes, the metadata is not present. The community member suspects that the metadata is being vectorized as part of the content, making it inaccessible. They have provided relevant code snippets and are seeking guidance on how to ensure the metadata is stored correctly and can be retrieved as expected. One of the comments suggests that the transformations should be passed to the constructor of the VectorStoreIndex, rather than to the insert_nodes method. Another comment indicates that the community member resolved the issue by adding metadata after creating all nodes, as the problem was with the getmetadata function being sent to the SentenceNoseParser.
Hello everyone,
I'm encountering some problems with storing metadata in Neo4j using the VectorStoreIndex. I'm creating nodes with important metadata. Here is the relevant part of my code:
Plain Text
def get_metadata(filename):
   for item in metadata:
    if item["url"] == filename:
     return item
   return {}
documents = SimpleDirectoryReader(dir_documents, file_metadata=get_metadata).load_data()
node_parser = SentenceWindowNodeParser.from_defaults(window_size=3,window_metadata_key="window",original_text_metadata_key="original_text")
nodes = node_parser.get_nodes_from_documents(documents)
transformations = [title_extractor, qa_extractor]
neo4j_vector_store = Neo4jVectorStore(neo_username, neo_password, neo_url, embed_dim, hybrid_search=True, index_name=index_name)
storage_context = StorageContext.from_defaults(vector_store=neo4j_vector_store)
vector_index = get_vector_index(neo_username, neo_password, neo_url, embed_dim, index_name)
vector_index.insert_nodes(nodes, transformations=transformations, storage_context=storage_context)

I then built a query engine used by an agent. While the agent retrieves these nodes, the nodes do not have the metadata I imported previously. Upon checking the Neo4j created nodes, I noticed that not only do the nodes lack this metadata, but the metadata has also been vectorized as part of the content. Therefore, I don't have access to that metadata when retrieving normal chunks.
Here is the code for the query engine:
Plain Text
node_postprocessors = [MetadataReplacementPostProcessor(target_metadata_key="window"),SimilarityPostprocessor(similarity_cutoff=0.5)]
index_query_engine = index.as_query_engine(similarity_top_k=doc_similarity_top_k, node_postprocessors=node_postprocessors)

Is this an error? Is it a problem with using SentenceWindowNodeParser or how i include the metadata? What can I do to ensure that the metadata is stored correctly and can be retrieved as expected?
Any help or guidance would be greatly appreciated.
Thank you!
L
d
2 comments
I don't think you can pass transformations to insert_nodes? I think that has to go in the constructor?

VectorStoreIndex(..., transfomrations=transformations)
Thank you, that is true. I resolve the issue adding metadata after creating all nodes, because the failing point was the function getmetadata send it to SentenceNoseParser.
Add a reply
Sign up and join the conversation on Discord