Find answers from the community

Updated 7 months ago

I am trying to get the node score but it

I am trying to get the node score but it always show as 1, where score 0 means 0% relevant and 1 means 100% relevant.
L
x
O
11 comments
it should be working, but seems there is a bug after weaviate v4 -- will try to debug at some point today
I see, I just started recently so I didn't know it is a bug after v4. Thanks
@Logan M : Came accross this in the weaviate forum. Debuged things on my end and got to the following conclusion : https://forum.weaviate.io/t/retrieved-document-score-returns-1-0-100-relevant-when-used-with-llamaindex/2396/8?u=othmane_hamzaoui
Did you arrive at something on your end ?
As for me, I used an interim solution.

After the query through LlamaIndex pipeline, I also did a direct query with Weaviate to get the document scores and did a replacement based on the retrieved docs
Got it, does what I mentionned in the weaviate forum answer resonate with what you found ?
fyi: I still went through llamaindex just revered part of the code in the v4 migration to make it work
Yeah it's more clear now on why it doesn't give the correct scores rather than being a blackbox to me.

Integrations with changed or incompatible libraries are still a nightmare for me
I guess monkey patch works better since it reduces the need for extra query
whats the extra query you're making? πŸ‘€
I created a function to get weaviate scores by directly querying with the database:

Plain Text
# Within my custom class
import weaviate
from weaviate.classes.query import MetadataQuery

"""
 params:
  self - init with Weaviate client
  collection_name - name of your collection
  question - text based question
  query_vector - your question converted to vector with the embed model used
  limit - number of docs to retrieve
  hybrid_score - also known as alpha in Weaviate to select between bm25 and vector  
"""
def get_weaviate_scores(self, collection_name, question, query_vector, limit, hybrid_score):
  collection = self.client.collections.get(collection_name)
  
  response = collection.query.hybrid(
    query=question,
    vector=query_vector,
    limit=limit,
    alpha=hybrid_score,

    return metadata=MetadataQuery(
      distance=True,
      certainty=True,
      score=True,
      explain_score=True,
    )
  )

  weaviate_score_list = {}
  for obj in response.objects:
    # Get your scores here and append to weaviate_score_list
    # e.g. weaviate_score_list[file_path] = score
  return weaviate_score_list
I see. I didn't do this, I just went through the llamaindex code base and changes the query done to weaviate to always return a distance. Then added a helper function that switches from distance to score =1-distance. So I only make one query to the vector db. Will open a PR in the llamaindex repo when I get some bandwidth
Oh, give me a ping when you post your solution then
Add a reply
Sign up and join the conversation on Discord