Find answers from the community

Updated 8 months ago

Hello everyone, After parsing a PDF into

Hello everyone, After parsing a PDF into structured data, I have segmented its content into nodes consisting of text chunks, with their corresponding headings and subheadings preserved as metadata. Utilizing the LLaMAIndex framework, I have generated vectors for each node to facilitate a semantic search. However, I'm encountering an issue where the search for specific content, denoted by version numbers, lacks precision. For instance, a query for 'Issues resolved in Version 4.1' incorrectly retrieves nodes related to 'Version 4.1.1'. What strategies can I employ within LLaMAIndex to improve the accuracy of my searches, ensuring that the results strictly correspond to the exact version number specified in the query?

Issues resolved in Version 4.0 Changes/Enhancements in Version 4.0 Issues resolved in Version 4.0.1 Changes/Enhancements in Version 4.0.1 Issues resolved in Version 4.0.2 Changes/Enhancements in Version 4.0.2 New Feature in Version 4.0.2 Issues resolved in Version 4.0.3 Changes/Enhancements in Version 4.0.3 Issues resolved in Version 4.0.4 Issues resolved in Version 4.1 Changes/Enhancements in Version 4.1 New features in Version 4.1 Issues resolved in Version 4.1.1 Changes/Enhancements in Version 4.1.1

I tried metadata extraction and searching but still the same problem. How can i do in llamindex
L
5 comments
Probably your best bet is getting the LLM to write your filters, using an autoretriever

This assumes you attached metadata to your chunks though that would be useful for filtering
Or you can write filters yourself
(pinecone is just the example here, most vector dbs support some form of filtering using the same approach)
Add a reply
Sign up and join the conversation on Discord