I am trying to get the node score but it

I am trying to get the node score but it always show as 1, where score 0 means 0% relevant and 1 means 100% relevant.

11 comments

it should be working, but seems there is a bug after weaviate v4 -- will try to debug at some point today

I see, I just started recently so I didn't know it is a bug after v4. Thanks

@Logan M : Came accross this in the weaviate forum. Debuged things on my end and got to the following conclusion : https://forum.weaviate.io/t/retrieved-document-score-returns-1-0-100-relevant-when-used-with-llamaindex/2396/8?u=othmane_hamzaoui
Did you arrive at something on your end ?

xxKwan

As for me, I used an interim solution.

After the query through LlamaIndex pipeline, I also did a direct query with Weaviate to get the document scores and did a replacement based on the retrieved docs

OOH

Got it, does what I mentionned in the weaviate forum answer resonate with what you found ?
fyi: I still went through llamaindex just revered part of the code in the v4 migration to make it work

xxKwan

Yeah it's more clear now on why it doesn't give the correct scores rather than being a blackbox to me.

Integrations with changed or incompatible libraries are still a nightmare for me

xxKwan

I guess monkey patch works better since it reduces the need for extra query

OOH

whats the extra query you're making? 👀

xxKwan

I created a function to get weaviate scores by directly querying with the database:

Plain Text

# Within my custom class
import weaviate
from weaviate.classes.query import MetadataQuery

"""
 params:
  self - init with Weaviate client
  collection_name - name of your collection
  question - text based question
  query_vector - your question converted to vector with the embed model used
  limit - number of docs to retrieve
  hybrid_score - also known as alpha in Weaviate to select between bm25 and vector  
"""
def get_weaviate_scores(self, collection_name, question, query_vector, limit, hybrid_score):
  collection = self.client.collections.get(collection_name)
  
  response = collection.query.hybrid(
    query=question,
    vector=query_vector,
    limit=limit,
    alpha=hybrid_score,

    return metadata=MetadataQuery(
      distance=True,
      certainty=True,
      score=True,
      explain_score=True,
    )
  )

  weaviate_score_list = {}
  for obj in response.objects:
    # Get your scores here and append to weaviate_score_list
    # e.g. weaviate_score_list[file_path] = score
  return weaviate_score_list

OOH

I see. I didn't do this, I just went through the llamaindex code base and changes the query done to weaviate to always return a distance. Then added a helper function that switches from distance to score =1-distance. So I only make one query to the vector db. Will open a PR in the llamaindex repo when I get some bandwidth

xxKwan

Oh, give me a ping when you post your solution then

Add a reply

Find answers from the community

I am trying to get the node score but it