Hi all I have a list of Documents that I

At a glance

Hi all, I have a list of Documents that I want to parse into nodes, and generate metadata about each node. Right now I am using the SimpleNodeParser, paired with some pre-built metadataextractors.

The question I have is regarding the SummaryExtractor. I want to create "prev" and "self" summaries for each node, to make sure that the local context of the Document is provided to the Node. However, I do not want the "prev" summary to be generated at the beginning of a new Document (referring to the first Node generated from a new Document), as this summary would refer to the last node from a previous Document (if I understand the functionality correctly), providing irrelevant context. I tried using the include_prev_next_rel, but that does not seem to resolve my issue. Should I write a custom metadata extractor for this functionality?

3 comments

LLogan M

hmm you could just remvoe that summary from the resulting nodes?

Otherwise yea, creating your own metadata extractor is an option. Tbh though, the current extractor probably shouldn't be doing that in the first place

OOverclockedClock

I was thinking about just writing a simple check in the metadata_extractor.process_nodes() call to check if the ref id matches the previous, if not, skip. Although a temporary solution it would probably resolve it too ig

OOverclockedClock

I'll think of something, thanks for the response regardless!

Add a reply

Find answers from the community

Hi all I have a list of Documents that I