Is there a way to get the embeddings out

At a glance

The post asks if there is a way to get the embeddings out of an index loaded from a persisted VectorStoreIndex on disk. A community member responds with a "hacky solution" that involves using the index.ref_doc_info to get a mapping of each ingested document and the nodes that came from that document, and then accessing the embedding vector directly from the index.vector_store._data.embedding_dict. However, the community member notes that this is not an easy or straightforward solution.

sskittythecat

Is there a way to get the embeddings out of an index loaded from a persisted VectoreStoreIndex on disk?

1 comment

LLogan M

Like the actual embedding vector?

mmm not easily right now.

Here's my hacky solution. You need the node_id, so here I use index.ref_doc_info to get a mapping of each injested document and the nodes that came from that document

Plain Text

>>> from llama_index import VectorStoreIndex, Document
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> index.ref_doc_info
{'e03d1380-fe7d-4828-bb3e-680afeb07bfc': RefDocInfo(node_ids=['a1cbafa9-e2c7-4cea-92b1-3834acb2aa6c'], metadata={'filename': 'README.md', 'category': 'codebase'})}
>>> vector = index.vector_store._data.embedding_dict['a1cbafa9-e2c7-4cea-92b1-3834acb2aa6c']
>>> print(len(vector))
1536
>>>

Add a reply

Find answers from the community

Is there a way to get the embeddings out