hey @Logan M

At a glance

The post asks how to extract the nodes with embeddings from the GPTVectorStoreIndex() object. A community member provides a solution using index.docstore.docs and index.vector_store._data.embedding_dict, but another community member notes that this approach does not work for PineconeVectorStore as it does not have the _data attribute. The discussion suggests that different vector stores may store embeddings differently, and a community member recommends using a debugger to investigate where Pinecone stores the embeddings. Another community member mentions that a pull request may be needed to attach the embeddings to the result nodes for Pinecone.

Useful resources

SSiddhant Saurabh

hey @Logan M
how to get the nodes with embedding back from GPTVectorStoreIndex()
here is the sample code

Plain Text

    def store_index(self, documents, payload, service_context):
        with self.lock:
            parser = service_context.node_parser
            nodes = parser.get_nodes_from_documents(documents)

            storage_context = self.get_pinecone_storage_context(payload, toquery=False)

            storage_context.docstore.add_documents(nodes)

            pc_index = GPTVectorStoreIndex(
                nodes,
                storage_context=storage_context,
                service_context=service_context,
            )

            if "oldDocumentId" in payload:
                self.delete_old_vector(payload)
                
        return pc_index

can we extract out nodes with embedding from pc_index?

6 comments

WWhiteFang_Jr

Yes you can extract these details

Plain Text

nodes = index.docstore.docs
embedding_dict = index.vector_store._data.embedding_dict

for node_id, node in nodes.items():
  # This will print the node object
  print(node)
  # This will print the embedding associated with the above node object
  print(embedding_dict[node_id])

LLogan M

Yea that's the only way right now. Should probably make this easier at some point

SSiddhant Saurabh

@WhiteFang_Jr
getting error : 'PineconeVectorStore' object has no attribute '_data'
on line: embedding_dict = index.vector_store._data.embedding_dict

WWhiteFang_Jr

Ah okay, I guess different Vector stores keep embeddings in a different place.
My example was for only GPTVectorStoreIndex directly.

You could use debugger and check where Pinecone stores the embeddings. If you put the stopper right at index creation step.

WWhiteFang_Jr

https://github.com/run-llama/llama_index/blob/9b798f819bb0afa6dabf418f8f2db87a31125d5e/llama_index/vector_stores/pinecone.py#L226

LLogan M

Pinecone is definitely different. Would have to add a PR to attach the embeddings to the result nodes

Add a reply

Find answers from the community

hey @Logan M