Find answers from the community

Updated 2 years ago

Hello I m getting some weird

At a glance

The community member is experiencing an AssertionError when using the FAISS vector store. They have provided the code they are using to index and load documents, and the error they are encountering when querying the index.

The community members discuss the issue and suggest that the problem may be related to a dimension mismatch between the embedding model used to create the index and the one used to query it. They also recommend ensuring that the service context is properly set when loading the index from disk.

The issue appears to be resolved by adding the service context when loading the index from disk, as suggested by one of the community members.

Hello, I'm getting some weird AssertionError when using FAISS vector store. Any idea?
Adding the code in the threads
J
L
10 comments
I am first indexing my documents using this:

node_parser = SentenceWindowNodeParser.from_defaults( window_size=10, window_metadata_key="window", original_text_metadata_key="original_text", ) llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1) ctx = ServiceContext.from_defaults( llm=llm, embed_model=HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" ), node_parser=node_parser, ) """ Setting up FAISS Vector Store - d = 768 to match the mpnet_base_v2 model """ d = 768 faiss_index = faiss.IndexFlatL2(d) vector_store = FaissVectorStore(faiss_index=faiss_index) storage_context = StorageContext.from_defaults(vector_store=vector_store) sentence_index = VectorStoreIndex.from_documents( all_docs, service_context=ctx, storage_context=storage_context, show_progress=True ) sentence_index.storage_context.persist()
I am loading my documents from the vector store using this code:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1) ctx = ServiceContext.from_defaults( llm=llm, embed_model=HuggingFaceEmbeddings( model_name="sentence-transformers/all-mpnet-base-v2" ), node_parser=node_parser, ) vector_store = FaissVectorStore.from_persist_dir("./storage") storage_context = StorageContext.from_defaults( vector_store=vector_store, persist_dir="./storage" ) index = load_index_from_storage(storage_context=storage_context) qa_template = Prompt(PROMPT) query_engine = index.as_query_engine( similarity_top_k=5, text_qa_template=qa_template, # the target key defaults to window to match the node_parser's default node_postprocessors=[ MetadataReplacementPostProcessor(target_metadata_key="window") ], streaming=True, )
But when I have to ask the question, i'm getting this:

Traceback (most recent call last):
File "/mnt/d/Dev/poetry_llamaWindow/QueryFAISS.py", line 75, in <module>
window_response = query_engine.query(query)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/indices/query/base.py", line 23, in query
response = self._query(str_or_query_bundle)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/query_engine/retriever_query_engine.py", line 169, in _query
nodes = self.retrieve(query_bundle)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/query_engine/retriever_query_engine.py", line 117, in retrieve
nodes = self._retriever.retrieve(query_bundle)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/indices/base_retriever.py", line 22, in retrieve
return self._retrieve(str_or_query_bundle)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 75, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File "/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 151, in _get_nodes_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
"/home/jax/.cache/pypoetry/virtualenvs/poetry-llamawindow-mJZojT2l-py3.10/lib/python3.10/site-packages/faiss/init.py", line 308, in replacement_search
assert d == self.d
AssertionError
Any clue what this is about? It does seem related to FAISS vector store after googling it a bit, but I can find only some old references from back in the GPT Index days
Hmmm took a peek at faiss source code


Did you create the 8ndex with the same embedding model that you are querying with?

I thiiiiiiink it seems to be a dimension mismatch πŸ€”
hey @Logan M - know that mpnet base 2 has 768 dimension and that's the one that i've set here:

d = 768 faiss_index = faiss.IndexFlatL2(d)
Attachment
image.png
model
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
(2): Normalize()
)
How are you setting the service context? Should make sure to pass it in when you load from disk too
@Logan M , you, this seems to have fixed it, adding the context in load from index. Thanks again πŸ™
Add a reply
Sign up and join the conversation on Discord