Chapterwise nodes

At a glance

The community members are discussing how to create a book embedding where each chapter is a separate node, so that a user can query for a summary of a specific chapter. The main points are:

- Creating a single node per chapter may not be feasible due to token limits, so a set of nodes with defined relations for each chapter is suggested instead.

- The community members discuss using metadata or the DocumentSummaryIndex to help with queries like "summarize chapter 3".

- There is a discussion around using a vector database and a retriever to get the relevant nodes based on metadata filters, but there are concerns about the token limit still being an issue.

There is no explicitly marked answer in the comments.

Useful resources

MMitchMcD

hi all, if i am embedding a book, is there a way to make each chapter a node, so that when i ask "summarize chapter x" it works through this node only? would appreciate guidance. thank you!

11 comments

WWhiteFang_Jr

I wouldn't recommend single node for each chapter as it can be very long and high chance llm token limit will get crossed.

You can create set of nodes and define relation for each chapter in. Still all the nodes will be used to create summary may not get possible.

If the chapters are small in size then you can try following this link customise the nodes or document object as per your requirement.
https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/usage_pattern.html#basic-usage-pattern

MMitchMcD

thank you , the chapters are not small , 3-4 pages of text, will the basic-usage-pattern be enough to address queries like "summarize chapter 3" as an example?

WWhiteFang_Jr

Not entirely though. You can try adding metadata to each chapter or try DocumentsummaryIndex along with metadata. This may be able to help you in query like these.

bbmax

Are you using a vector db @MitchMcD ?

WWhiteFang_Jr

Will it help @bmax to tackle this scenario?

bbmax

I was just wondering, I was thinking if he was using a vector db, he could do what you said with a chapter in each node's metadata, use a retriever to get all of the nodes there and then pass into documentSummaryIndex

bbmax

trying to figure out how to do it myself lol

bbmax

like can you just do

Plain Text

filters = MetadataFilters(filters=[ExactMatchFilter(key="name", value="Chapter 1")])
retriever = index.as_retriever(filters=filters)
retriever.retrieve()

to get all nodes

bbmax

but you can't do empty retrieve() so, wondering how

WWhiteFang_Jr

Yeah but while response generation all the nodes may not get used if the node length crosses token limit, that's what I'm thinking. So it may look half cooked response

MMitchMcD

that's the plan

Add a reply

Find answers from the community

Chapterwise nodes