Is there notebook regarding source retrieval for chunks? For example if my chunks are 512 tokens and my query engine returns 3 of the top chunks I can't return those to the user because 512 tokens is like multiple paragraphs.
I can see two options, chunk small and when doing retrieval auto merge small chunks to larger chunks but return to the user small chunks. OR keep chunks large but after detecting the right nodes, perform a second retrieval to look for specific sentences in the chunk that are most relevant. Any thoughts?
well if each chunk is around 512 tokens then it will return a page of text and if we have 3 of those that's three pages. Think back to the SEC pdf example Llamaindex had. When you asked a question it would highlight a few sentences as its soruce.
Hey @Logan M , just ran into this thread. Do you know of a streamlit example that demonstrates PDF highlighting of citations (i.e by using fuzzy matching)?