Find answers from the community

Updated 2 months ago

Is there notebook regarding source

Is there notebook regarding source retrieval for chunks? For example if my chunks are 512 tokens and my query engine returns 3 of the top chunks I can't return those to the user because 512 tokens is like multiple paragraphs.
B
L
y
12 comments
I can see two options, chunk small and when doing retrieval auto merge small chunks to larger chunks but return to the user small chunks.
OR
keep chunks large but after detecting the right nodes, perform a second retrieval to look for specific sentences in the chunk that are most relevant.
Any thoughts?
Why is returning those 3 chunks an issue in this example?
You are trying to indentify more specific pieces of text?
well if each chunk is around 512 tokens then it will return a page of text and if we have 3 of those that's three pages. Think back to the SEC pdf example Llamaindex had. When you asked a question it would highlight a few sentences as its soruce.
Yes basically identify the most relevant pieces in selected node
@Logan M Any thoughts on this?
This is sooo cool.
Thank you so much @Logan M
Hey @Logan M , just ran into this thread. Do you know of a streamlit example that demonstrates PDF highlighting of citations (i.e by using fuzzy matching)?
I'm not aware of anything like that for streamlit no
Thank you @Logan M
Add a reply
Sign up and join the conversation on Discord