Find answers from the community

Updated 4 months ago

Is there a way to get the embeddings out

At a glance

The post asks if there is a way to get the embeddings out of an index loaded from a persisted VectorStoreIndex on disk. A community member responds with a "hacky solution" that involves using the index.ref_doc_info to get a mapping of each ingested document and the nodes that came from that document, and then accessing the embedding vector directly from the index.vector_store._data.embedding_dict. However, the community member notes that this is not an easy or straightforward solution.

Is there a way to get the embeddings out of an index loaded from a persisted VectoreStoreIndex on disk?
L
1 comment
Like the actual embedding vector?

mmm not easily right now.

Here's my hacky solution. You need the node_id, so here I use index.ref_doc_info to get a mapping of each injested document and the nodes that came from that document

Plain Text
>>> from llama_index import VectorStoreIndex, Document
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> index.ref_doc_info
{'e03d1380-fe7d-4828-bb3e-680afeb07bfc': RefDocInfo(node_ids=['a1cbafa9-e2c7-4cea-92b1-3834acb2aa6c'], metadata={'filename': 'README.md', 'category': 'codebase'})}
>>> vector = index.vector_store._data.embedding_dict['a1cbafa9-e2c7-4cea-92b1-3834acb2aa6c']
>>> print(len(vector))
1536
>>> 
Add a reply
Sign up and join the conversation on Discord