Find answers from the community

Updated 2 years ago

I m using the PDFReader to read in my

I'm using the PDFReader to read in my document. Is there a way to specify how many pages you want in one chunk? I think for now, the default is to have each page as one document object. I'd like to have 5-10 pages in one document object.
L
M
6 comments
Wouldn't 5-10 pages just get split into many nodes anyways? Curious what the use case is πŸ‘€
So I'd like to try out the document summary index with 100-page long document. WIth the default set up, the generation of document summary index is very slow since it geneartes a summary for each doc id.
I guess the main question would be how to use the document summary index for long documents
ohhhh ok that makes sense...

For now, you might have to just manually concat document objects for each page into a single page πŸ™‚ Something like this Document("\n".join[doc.text for doc in documents])


Would be a pretty easy PR change to the reader too to allow documents per page
Yea that's kinda what I was trying to do. Thanks for the instruction. I guess to go one step further, would the document summary index be a good fit for multi-document chat bot? I was using a custom setup (kinda like QASummaryGraph) for a single-document chatbot and it worked super well. Now I want to expand it to multi-document chat I was looking for a better index struct to handle questions that requires router and questions that require synthesis.
hmm, maybe! Or maybe a router query engine on top of some indexes would also be a good approach
Add a reply
Sign up and join the conversation on Discord