Score is none

Are you querying a graph? Or just a single vector index?

A graph @Logan M

Attachment

It's still returning a relevant document from the query

What's the structure of the graph look like?

Attachment

Attachment

very simple, very Hamfisted collection of Simple Vector indexes

the query is vanilla, nothing fancy configured on it

Plain Text

response = graph.query(query_string_from_input_widget)

Hmmmm the only thing I can think of off the top of my head is that node is actually the summary of one of the indexes?

That's odd. He is one of 6 people in that index. Strange that he would be the result of the summary. Even so, don't summaries get an embedding?

Yea they don't get an embedding, which is why it doesn't have a score? 🤔

Very strange I agree. Just trying to guess how the score could be none lol

When you say summary

do you mean an interstitial representation of the document?

(done live, not while indexing)

How many source nodes did your query return, out of curiosity?

And by summary, I mean the index_summaries that you passed in when creating the index

checking

@Logan M 4

OK, one per sub index, at least that makes sense lol

Hmmmm

Idk man, this is pretty spooky. Maybe some of the embeddings failed to create? 🤔

What would be a good way to trace a document-id back to the source document. Does the index keep track of where elements of it came from?

This might be likely. maybe it bugged out on doc/docx processing or something?

just saw this in the debugger I built

Attachment

You can trace back to the ref doc, but this is usually only useful if you assigned the documents useful doc_ids

response.source_nodes[0].node.ref_doc_id should map it back I think

checking

it gives hash like this 7118a538-ebe6-4aa0-87a9-9d053bdee95f, but my files might look like bio_sketches/LoganM.pdf. I assume the reference to the path is thrown away past the data-loading step?

Yeaaa when you load the documents, the doc ID is randomly generated, or you can set it manually to a unique value

Alternatively, you can also set the extra_info dict of each document object with other info that should get passed to the nodes

ah ok, so this would change, perhaps, how the index is constructed. I would, instead of pointing a SimpleDirectoryReader to the directory , probably be loading in documents one at a time and appending the path data to it by over-riding the document or passing in the extra info. 🤔

The node score can be none when the node is selected based on relationship with another node.

ohh so if a document is chunked

So for example a node with score can have its next node also in the source nodes but with score none.

This next node might not have the embedding score but it’s being queries because it’s related to the node that has the score

that is a good explanation

it'd make sense

I hope that is what’s happening in your case

You can confirm by looking at the next and previous relationship of the nodes that have source nodes

I'll try that

Happy hacking

the None nodes have None for their ref doc

Attachment

Attachment

Attachment

Attachment

@BioHacker to no avail. But thank you for the suggestion. I never knew about prev_node_id and next_node_id

useful to be able to reference that

Something potentially wrong with those nodes. Easiest way to deal with this is to set similar_cutoff att in the query call. It can also be done by calling the node post processor to eliminate these none nodes before query takes place.