extra_info
parameter of the Node
object. This metadata can be any key-value pairs that you want to associate with the node.from llama_index.data_structs.node import Node nodes = [ Node('The Shawshank Redemption', extra_info={ "author": "Stephen King", "theme": "Friendship", }), Node('The Godfather', extra_info={ "director": "Francis Ford Coppola", "theme": "Mafia", }), Node("Inception", extra_info={ "director": "Christopher Nolan", }) ]
SupabaseVectorStore
, PineconeVectorStore
, DocArrayHnswVectorStore
, or DocArrayInMemoryVectorStore
.from llama_index.storage.storage_context import StorageContext from llama_index.vector_stores import PineconeVectorStore vector_store = PineconeVectorStore(index_name="your_index_name") storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex(nodes, storage_context=storage_context)
MetadataFilters
and ExactMatchFilter
. You can include one or more of your metadata keys in the filters.``
**Source** (<https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html>)
Remember to replace
"your_index_name" with the name of your index, and
"theme" and
"Mafia" with your desired key and value for filtering.
----------------------
Note: you can ask me a follow up question by @ mentioning me again` :speech_balloon:Node
object. This data typically represents a chunk of text from a document. Here's an example:from llama_index.data_structs.node import Node, DocumentRelationship node1 = Node(text="<text_chunk>", doc_id="<node_id>") node2 = Node(text="<text_chunk>", doc_id="<node_id>")
<text_chunk>
is where you specify your data. Replace <text_chunk>
with the actual text data you want to store in the node. Similarly, replace <node_id>
with a unique identifier for each node.document = Document( 'text', extra_info={ 'filename': '<doc_file_name>', 'category': '<category>' } )
'text'
with your actual text data, <doc_file_name>
with the filename of the document, and <category>
with the category of the document.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:ModuleNotFoundError: No module named 'llama_index.data_structs.node'
typically occurs when the llama_index
package is not installed in your Python environment, or your Python environment is not set up correctly.llama_index
package using pip. Open your terminal and run the following command:pip install llama-index
llama_index
is installed. If you're using a virtual environment, ensure that it's activated when you run your Python script.from llama_index.data_structs.node import Node
Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:Document
objects from text data, which are then parsed into Node
objects.Document
objects from the extracted text. Here's a basic example of how you might do this:import PyPDF2 from llama_index import Document # Open the PDF file in binary mode with open('your_file.pdf', 'rb') as file: # Create a PDF file reader object reader = PyPDF2.PdfFileReader(file) # Get the text content from the PDF text = '' for page_num in range(reader.numPages): page = reader.getPage(page_num) text += page.extractText() # Create a Document object from the text document = Document(text)
'your_file.pdf'
with the path to your PDF file. After creating the Document
object, you can then parse it into Node
objects as per the usual LlamaIndex workflow.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:Document
objects. This metadata can be any key-value pairs that you want to associate with the document. The metadata is included in the extra_info
parameter of the Document
object.Document
object.from llama_index import Document document = Document( 'text', extra_info={ 'filename': '<doc_file_name>', 'category': '<category>' } )
'text'
with your actual text data, <doc_file_name>
with the filename of the document, and <category>
with the category of the document.Document
object has been created.document.extra_info = {'filename': '<doc_file_name>'}
<doc_file_name>
with the filename of the document.SimpleDirectoryReader
and file_metadata
hook: This will automatically run the hook on each document to set the extra_info
field.from llama_index import SimpleDirectoryReader filename_fn = lambda filename: {'file_name': filename} # automatically sets the extra_info of each document according to filename_fn documents = SimpleDirectoryReader('./data', file_metadata=filename_fn)
'./data'
with the path to your directory of documents.extra_info
dictionary of each document will show up in the extra_info
of each source node created from the document. This information is included in the nodes, enabling the index to utilize it on queries and responses.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon: