%pip install llama-index-vector-stores-neo4jvector !pip install llama-index
import os import openai os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY" openai.api_key = os.environ["OPENAI_API_KEY"]
from llama_index.vector_stores.neo4jvector import Neo4jVectorStore username = "neo4j" password = "pleaseletmein" url = "bolt://localhost:7687" embed_dim = 1536 neo4j_vector = Neo4jVectorStore(username, password, url, embed_dim)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext # load documents documents = SimpleDirectoryReader("./data/paul_graham").load_data() storage_context = StorageContext.from_defaults(vector_store=neo4j_vector) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine() response = query_engine.query("What happened at interleaf?")
"OPENAI_API_KEY"
, "neo4j"
, "pleaseletmein"
, and "bolt://localhost:7687"
with your actual OpenAI API key, Neo4j username, password, and URL respectively.Neo4jGraphStore
class. Here is an example:from llama_index.graph_stores.neo4j import Neo4jGraphStore username = "neo4j" password = "retractor-knot-thermocouples" url = "bolt://44.211.44.239:7687" database = "your_database_name" # replace with your database name graph_store = Neo4jGraphStore( username=username, password=password, url=url, database=database, )
"your_database_name"
with the name of your Neo4j database. This will create a Neo4jGraphStore
instance that connects to the specified database.node_label
and index_name
are parameters used when setting up a Neo4jGraphStore
or Neo4jVectorStore
.node_label
: This is a string that represents the label of the nodes in the Neo4j graph. In Neo4j, a label is a named graph construct that is used to group nodes into sets; all nodes labeled with the same label belongs to the same set. Many nodes can have the same label. In LlamaIndex, node_label
is used to specify which set of nodes in the Neo4j graph you are working with.index_name
: This is a string that represents the name of the index in the Neo4jVectorStore
. An index in a database is a data structure that improves the speed of data retrieval operations. In LlamaIndex, index_name
is used to specify the name of the index for faster data retrieval.from llama_index.vector_stores.neo4jvector import Neo4jVectorStore username = "neo4j" password = "password" url = "bolt://localhost:7687" node_label = "Entity" # replace with your node label index_name = "my_index" # replace with your index name embedding_dimension = 768 # replace with your embedding dimension vector_store = Neo4jVectorStore( username=username, password=password, url=url, node_label=node_label, index_name=index_name, embedding_dimension=embedding_dimension, )
"Entity"
and "my_index"
with your actual node label and index name respectively.next
together with the vector embeddingsnext
, you can use the SentenceSplitter
class in LlamaIndex to chunk the document, and then create a Neo4jGraphStore
to store the chunks and their relations in a Neo4j graph. Here is a step-by-step guide:SentenceSplitter
class in LlamaIndex to do this. Here is an example:from llama_index.core.node_parser import SentenceSplitter # specify the chunk size chunk_size = 128 # create a SentenceSplitter instance splitter = SentenceSplitter(chunk_size=chunk_size) # chunk the document nodes = splitter.get_nodes_from_documents([document])
document
with your actual document and chunk_size
with the desired chunk size.Neo4jGraphStore
to store the chunks and their relations in a Neo4j graph. Here is an example:from llama_index.graph_stores.neo4j import Neo4jGraphStore username = "neo4j" password = "password" url = "bolt://localhost:7687" database = "database_name" # replace with your database name # create a Neo4jGraphStore instance graph_store = Neo4jGraphStore( username=username, password=password, url=url, database=database, ) # add nodes to the graph for node in nodes: graph_store.add_node(node) # add 'next' relations between nodes for i in range(len(nodes) - 1): graph_store.add_relation(nodes[i], nodes[i + 1], "next")
"database_name"
with the name of your Neo4j database.OpenAIEmbedding
class in LlamaIndex to generate vector embeddings for the chunks. Here is an example:RecursiveSplitter
in LlamaIndex is a class that allows you to recursively split a document into smaller chunks based on a specified chunk size. This can be useful when you want to break down a large document into manageable pieces for processing or analysis.RecursiveSplitter
:from llama_index.core.node_parser import RecursiveSplitter # specify the chunk size chunk_size = 128 # create a RecursiveSplitter instance splitter = RecursiveSplitter(chunk_size=chunk_size) # chunk the document nodes = splitter.get_nodes_from_documents([document])
document
with your actual document and chunk_size
with the desired chunk size.get_nodes_from_documents
method will return a list of Node
objects, each representing a chunk of the document. Each Node
object has a text
attribute that contains the chunk text, and a children
attribute that contains any sub-chunks.SemanticSplitterNodeParser
in LlamaIndex is a class that allows you to split a document into chunks based on semantic similarity rather than a fixed chunk size. This means that each chunk will contain sentences that are semantically related to each other.SemanticSplitterNodeParser
:from llama_index.core.node_parser import SemanticSplitterNodeParser from llama_index.embeddings.openai import OpenAIEmbedding # create an OpenAIEmbedding instance embed_model = OpenAIEmbedding() # create a SemanticSplitterNodeParser instance splitter = SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model ) # chunk the document nodes = splitter.get_nodes_from_documents([document])
document
with your actual document. The buffer_size
parameter specifies the number of sentences to group together when evaluating semantic similarity, and the breakpoint_percentile_threshold
parameter specifies the percentile of cosine dissimilarity that must be exceeded between a group of sentences and the next to form a node.get_nodes_from_documents
method will return a list of Node
objects, each representing a chunk of the document. Each Node
object has a text
attribute that contains the chunk text.SentenceWindowNodeParser
in LlamaIndex is a class that splits a document into individual sentences and includes a "window" of surrounding sentences in the metadata for each sentence. This can be useful when you want to generate embeddings that have a very specific scope.SentenceWindowNodeParser
:from llama_index.core.node_parser import SentenceWindowNodeParser # create a SentenceWindowNodeParser instance with default settings node_parser = SentenceWindowNodeParser.from_defaults( # how many sentences on either side to capture window_size=3, # the metadata key that holds the window of surrounding sentences window_metadata_key="window", # the metadata key that holds the original sentence original_text_metadata_key="original_sentence", )
window_size
parameter specifies how many sentences on either side of each sentence to capture in the window. The window_metadata_key
and original_text_metadata_key
parameters specify the metadata keys that hold the window of surrounding sentences and the original sentence, respectively.SentenceWindowNodeParser
instance, you can use it to parse a document into individual sentences with surrounding windows. Here is an example:# parse the document nodes = node_parser.get_nodes_from_documents([document])
document
with your actual document. The get_nodes_from_documents
method will return a list of Node
objects, each representing a sentence and its surrounding window.SentenceWindowNodeParser
in LlamaIndex has its unique advantages and potential drawbacks compared to other parsers:SentenceWindowNodeParser
splits documents into individual sentences, which allows it to capture more fine-grained details. For example, it can capture specific words or phrases that might be missed when using a parser that splits documents into larger chunks.SentenceWindowNodeParser
includes a "window" of surrounding sentences in the metadata for each sentence. This provides additional contextual information that can be useful for understanding the meaning of each sentence.SentenceWindowNodeParser
is useful for generating embeddings that have a very specific scope. By adjusting the window size, you can control the amount of context included with each sentence.SentenceWindowNodeParser
, you would first need to parse your documents into nodes using the SentenceWindowNodeParser
. After that, you can use the get_embedding()
method on each node to get its vector embedding.from llama_index.core.node_parser import SentenceWindowNodeParser # create a SentenceWindowNodeParser instance with default settings node_parser = SentenceWindowNodeParser.from_defaults( window_size=3, window_metadata_key="window", original_text_metadata_key="original_sentence", ) # parse the document nodes = node_parser.get_nodes_from_documents([document]) # get the embedding for the first node embedding = nodes[0].get_embedding() print(embedding)
document
with your actual document. The get_nodes_from_documents
method will return a list of Node
objects, each representing a sentence and its surrounding window. The get_embedding()
method will return the vector embedding for a node.SentenceWindowNodeParser
in a Neo4j database, you can use the add_relation
method of the Neo4jGraphStore
class in LlamaIndex. Here is a step-by-step guide:SentenceWindowNodeParser
. Here is an example:from llama_index.core.node_parser import SentenceWindowNodeParser # create a SentenceWindowNodeParser instance with default settings node_parser = SentenceWindowNodeParser.from_defaults( window_size=3, window_metadata_key="window", original_text_metadata_key="original_sentence", ) # parse the document nodes = node_parser.get_nodes_from_documents([document])
document
with your actual document.Neo4jGraphStore
to store the nodes and their relations in a Neo4j graph. Here is an example:from llama_index.graph_stores.neo4j import Neo4jGraphStore username = "neo4j" password = "password" url = "bolt://localhost:7687" database = "database_name" # replace with your database name # create a Neo4jGraphStore instance graph_store = Neo4jGraphStore( username=username, password=password, url=url, database=database, ) # add nodes to the graph for node in nodes: graph_store.add_node(node) # add 'next' relations between nodes for i in range(len(nodes) - 1): graph_store.add_relation(nodes[i], nodes[i + 1], "next")
"database_name"
with the name of your Neo4j database. The add_relation
method adds a relation of type "next" between each node and the next node in the list.Neo4jGraphStore
class in LlamaIndex does not have an add_relation
method. Instead, it has an upsert_triplet
method that you can use to add a relation between two nodes. Here is how you can modify the previous example to use the upsert_triplet
method:from llama_index.graph_stores.neo4j import Neo4jGraphStore username = "neo4j" password = "password" url = "bolt://localhost:7687" database = "database_name" # replace with your database name # create a Neo4jGraphStore instance graph_store = Neo4jGraphStore( username=username, password=password, url=url, database=database, ) # add nodes to the graph for node in nodes: graph_store.upsert_node(node) # add 'next' relations between nodes for i in range(len(nodes) - 1): graph_store.upsert_triplet(nodes[i].id, "next", nodes[i + 1].id)
"database_name"
with the name of your Neo4j database. The upsert_triplet
method adds a relation of type "next" between each node and the next node in the list.GraphStore
in LlamaIndex, you can use the query
method of the GraphStore
class. Here is an example:from llama_index.core.query_engine import QueryBundle # create a QueryBundle instance query_bundle = QueryBundle(query_str="your_query") # query the graph store response = graph_store.query(query_bundle)
"your_query"
with your actual query. The query
method will return a response from the graph store.GraphStore
you are using. For example, the SimpleGraphStore
class in LlamaIndex does not support the query
method and will raise a NotImplementedError
if you try to use it.Node
class, which has a labels
attribute that can hold multiple labels for the node. You can add multiple labels to a node by adding them to the labels
attribute, which is a list.from llama_index.core import Node # create a Node instance node = Node(id_="node1", text="This is a node.") # add multiple labels to the node node.labels = ["label1", "label2", "label3"]
"node1"
, "This is a node."
, "label1"
, "label2"
, and "label3"
with your actual node ID, node text, and labels respectively.Neo4jGraphStore
class in LlamaIndex only supports a single label for each node.