Find answers from the community

Updated 9 months ago

Node Parser

Hey guys, is there any method built in llamaindex to remove similard nodes or nodes that have too few words? For example I imagine a node extracted from a PDF that has lots of symbols and few words, I'd like to delete them.
L
W
4 comments
You can remove the symbols and small nodes by iterating over the nodes and removing them and then insert them into your index.

Plain Text
# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

for node in nodes:
  # perform the operation of removing here...


for similar, you'll have to do it manually as well by comparing. Not sure if there is any method on this!
It'd great to have some built in methods to "mantain" the vector store
but yeah, I'll iterate over all my nodes.
Add a reply
Sign up and join the conversation on Discord