Retrieving Relevant Metadata and Sentence-Level Context...

At a glance

The community member is using VectorStoreIndex and wants to know how to get information about which sentences or paragraphs in the context the answer is generated from, not just the source node score. They believe they can achieve this by customizing the prompt and inserting extra metadata information associated with each node. However, they are unsure if they can do this at a high level or if they need to build a response synthesis from scratch.

The comments suggest using fuzzy matching techniques like thefuzz library to find the closest sentences or paragraphs to the generated answer. Another approach mentioned is to get the embeddings of each sentence and find the cosine distance that best matches the provided answer. The community members also discuss the possibility of using BM25 per-sentence as a faster alternative.

Useful resources

hhaedamon

I am using VectorStoreIndex. When I generate a response, I want to see if I can get where in a node that the answer is generated from. It's not enough to check the source_node score - I want to get which sentences or paragraphs from the context where an answer is pulled from.
I believe I can get this by customizing the prompt and insert extra metadata information associated with each node. Then I want to add an additional instruction in the prompt "pull the relevant metadata and which sentences where the answer was generated from".
I'm not quite sure if I can do this at a high level, or if I need to build a response synthesis from scratch. Any help on this?

5 comments

LLogan M

Use fuzzy matching instead against the source node text

LLogan M

https://github.com/seatgeek/thefuzz

LLogan M

Nice library for this, split your source nodes and responses into sentences (or some other delimeter) and compute some matrix to find the closest

hhaedamon

@Logan M
I am finding that my first pass using above technique using levenshtein distance is yielding mediocre results.
***
w kapa.ai/phorm.ai, getting the reference url is fairly basic, but when I do search results, google is able to provide the precise paragraph that best fits an answer. Still not sure how to do that easily.
I imagine I can get all the embeddings of each sentence and find the cosine distance that best matches the provided answer

LLogan M

Yea thats another approach for sure. Even BM25 per-sentence is probably good too if it needs to be faster

Add a reply

Find answers from the community

Retrieving Relevant Metadata and Sentence-Level Context for Responses