How does vector similarity work I have a

At a glance

The community member is asking why an exact sentence they sent is not found in the returned nodes, even though the sentence is present in one of the documents. A community member explains that the embeddings generated from a single sentence are different from the embeddings generated for an entire chunk or node, as the node has more context that biases the embeddings. Therefore, searching for exact sentences is not a good way to test vector similarity, as it is more about the similarities in overall semantics. The community member suggests augmenting the vector search by also using a keyword index, and provides an example of setting up a custom retriever to do this.

Useful resources

zzainab

How does vector similarity work? I have a case in which I have a group of documents, and I have sent an exact sentence found in one of the documents, but when I checked the returned nodes, none of them contains the sentence I have sent; any clarification why this may happen?

1 comment

LLogan M

The embeddings generated from a single sentence will be different than the embeddings generated for an entire chunk/node. The node has more context, which biases the embeddings, compared to a single sentence

Hence, typing exact sentences is not a good way to test vector similarity. It's more about the similarities in overall semantics.

You can augment vector search by also using a keyword index. There is an example here that sets up a custom retriever to do just that
https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/CustomRetrievers.ipynb

Add a reply

Find answers from the community

How does vector similarity work I have a