Find answers from the community

e
ethan
Offline, last seen 3 months ago
Joined September 25, 2024
How can I use llamaindex to query a pdf that has text and images?
1 comment
V
Hi, I have two sets of nodes and want to find corresponding relationship between the two sets by the similarity between nodes' distances. I created two VectorStoreIndexes and persisted them in the same ./storage. My first attempt is (try) to retrieve all nodes from index1, and use each node from index1 to query index2, and try to take or pass from the results. I am stuck on the step to "retrieve all nodes from index1" that seems should have been straightforward. I even tried to get it from index1.vector_store._data.text_id_to_ref_doc_id.keys() however that seem to contain all nodes in index1 and index2. do you have any suggestion here?
16 comments
e
L
e
ethan
·

Docstore

@Logan M How can I iterate through all Nodes in a VectorStoreIndex? I used to find VectorStoreIndex.index_struct.nodes_dict work for this purpose earlier, or VectorStoreIndex.docstore.docs. I find this stopped working after I customized storage_context to use a vectorstore (LanceDB), using a storage context like this:
Plain Text
StorageContext(docstore=<llama_index.core.storage.docstore.simple_docstore.SimpleDocumentStore object at 0x7fe31e89bd30>,
               index_store=<llama_index.core.storage.index_store.simple_index_store.SimpleIndexStore object at 0x7fe31e8994e0>,
               vector_stores={'default': <llama_index.vector_stores.lancedb.base.LanceDBVectorStore object at 0x7fe31e89b3a0>,
                              'image': <llama_index.core.vector_stores.simple.SimpleVectorStore object at 0x7fe31e89ba30>},
               graph_store=<llama_index.core.graph_stores.simple.SimpleGraphStore object at 0x7fe31e89b9a0>)

All I did was then
Plain Text
index = VectorStoreIndex(nodes, storage_context=storage_context)
index.storage_context.persist()

the ./storage/docstore.json would be empty. Is this expected? while if I don't customize storage_context, I find ./storage/docstore.json contains the TextNode I used to create VectorStoreIndex.

I was wishing if I switch to use LanceDB or any other vector store, llama index's behavior with respect to docstore etc would remain unchanged. Am I missing something that's obvious? Thank you for any assistance.
4 comments
L
e