Hello, I use VectorStoreIndex to

At a glance

The community member is using VectorStoreIndex to evaluate an embedding model and is curious about the default similarity function. They noticed that the node scores are all between 0 and 1, rather than the expected range of -1 to 1 for cosine similarity. The community members suggest that the similarity function may be using a normalized L2 function instead. They also provide information on how to change the default similarity function, by subclassing the embedding model and changing the similarity mode. However, the original community member still has a question about why the similarity scores are limited to the 0-1 range, even when using the default cosine similarity function.

Useful resources

AAzathoth

Hello, I use VectorStoreIndex to evaluate embedding model. I think default similarity function is cos similarity. But as I check node score, they are all between (0, 1) rather than (-1, 1). It seems like maybe normalized L2 function is being used.
Why all similarity score is between 0-1? Can I change default similarity function of VectorStoreIndex to some specific function instead of default cos function?

6 comments

WWhiteFang_Jr

Hi, Yes you can change the default mode of embedding by changing the mode.
https://github.com/run-llama/llama_index/blob/79518c4cc39981140b2e87c9a701b09d74d47e9a/llama-index-core/llama_index/core/base/embeddings/base.py#L37

WWhiteFang_Jr

You'll have to subclass your embedding model and change the similarity mode as per your fit.

AAzathoth

Thanks a lot. With your kind help now I know how to change similarity fn. But still 1 more question. Why all similarity scores are between (0, 1). Since I am using default COSINE fn, it should be in (-1, 1) right?

Attachment

9WOHTs0Y8YMXXnllQYnA2A1PrUXgBGtra1atGjRGfurqqr0uuvWz8QAGOIEQAAYBTXjAAAAKOIEQAAYBQxAgAAjCJGAACAUcQIAAAwihgBAABGESMAAMAoYgQAABhFjAAAAKOIEQAAYBQxAgAAjCJGAACAUf8HV33B7M82R0AAAAASUVORK5CYII.png

WWhiteFang_Jr

Maybe @Logan M can help with this query

LLogan M

This is the default calculation

Plain Text

product = np.dot(embedding1, embedding2)
norm = np.linalg.norm(embedding1) * np.linalg.norm(embedding2)
return product / norm

AAzathoth

OK, Thanks

Add a reply

Find answers from the community

Hello, I use VectorStoreIndex to