I have embeddings and text stored in local machine and ...

At a glance

The community member has embeddings and text stored locally and wants to create a VectorStoreIndex, but is encountering an error. A community member suggests that the syntax is incorrect and that Faiss may not be the best choice, as it does not store the document contents. They provide an alternative approach using TextNode. The community member then receives an error and asks if the embeddings are being calculated from scratch, which they do not want. Another community member responds that there is code to skip embeddings if they are already present, and that the community member will need to pass in the embedding model anyway for retrieval. The community member confirms that the suggested approach helped resolve the issue.

Useful resources

JJaytimbadia

Hello Folks! Happy New Year!

Just one Query!

I have embeddings and text stored in local machine. I want to create VectorStoreIndex out of it.
But its not working. Here is the code. Can anyone pls look into it?

Plain Text

dim = 1536
doc1_index = faiss.IndexFlatL2(dim)
doc1_documents = []
for i, doc in enumerate(response):
    source = doc["_source"]
    doc1_index.add(np.asarray([source["content_vector"]]))
    doc1_documents.append(Document(text=source["content"]))

doc1_vector_store = FaissVectorStore(faiss_index=doc1_index)
storage_context = StorageContext.from_defaults(vector_store=doc1_vector_store)
doc1_llama_index = VectorStoreIndex.from_vector_store(doc1_documents, storage_context=storage_context)


##### Output 

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[28], line 3
      1 doc1_vector_store = FaissVectorStore(faiss_index=doc1_index)
      2 storage_context = StorageContext.from_defaults(vector_store=doc1_vector_store)
----> 3 doc1_llama_index = VectorStoreIndex.from_vector_store(doc1_documents, storage_context=storage_context)

File c:\Users\61097809\loganalytics\graph_venv\Lib\site-packages\llama_index\core\indices\vector_store\base.py:94, in VectorStoreIndex.from_vector_store(cls, vector_store, embed_model, **kwargs)
     87 @classmethod
     88 def from_vector_store(
     89     cls,
   (...)
     92     **kwargs: Any,
     93 ) -> "VectorStoreIndex":
---> 94     if not vector_store.stores_text:
     95         raise ValueError(
     96             "Cannot initialize from a vector store that does not store text."
     97         )
     99     kwargs.pop("storage_context", None)

AttributeError: 'list' object has no attribute 'stores_text'

Thanks !

9 comments

LLogan M

This doesn't seem like correct syntax? from_vector_store() takes the vector store as the first argument.

Faiss probably isnt the best choice for this pattern, since it doesn't store the document contents.

You probably want something like this instead (assuming content_vector is a list of floats)

Plain Text

from llama_index.core.schema import TextNode

nodes = []
for i, doc in enumerate(response):
    source = doc["_source"]
    nodes.append(TextNode(text=source["content"], embedding=source["content_vector"]))

doc1_vector_store = FaissVectorStore(faiss_index=doc1_index)
storage_context = StorageContext.from_defaults(vector_store=doc1_vector_store)
doc1_llama_index = VectorStoreIndex(nodes=nodes, storage_context=storage_context)

JJaytimbadia

@Logan M Thanks for reaching out!

Happy New Year to you!

As per the above code, i am receiving the following error attached as image.

Attachment

JJaytimbadia

If I provide embedding model details, it starts to calculate from scratch for all the nodes.

JJaytimbadia

which I do not want, since I already have those pre-existing.

LLogan M

How do you know it's embedding from scratch? There's specific code to skip embeddings if they are already there...
https://github.com/run-llama/llama_index/blob/ad3be7fec4fa0f032661f9783c462a45cf3f6a3f/llama-index-core/llama_index/core/indices/utils.py#L154

Youll need to pass in the embed model anyways in order to do retrieval though, which is also why it's still asking for an embed model

LLogan M

You could confirm this by passing in openai embeddings with a fake api key. If all the nodes have embeddings, it won't get called 👀

JJaytimbadia

Thank you so much @Logan M

JJaytimbadia

It indeed helped!

LLogan M

great!

Add a reply

Find answers from the community

I have embeddings and text stored in local machine and want to create a VectorStoreIndex out of it but it's not working.