Hi! Im really looking to pick someone's brain about indexing an online document that is structured like the picture attached (but much larger lol!). All the 'leaf node' files are extremely nested (some underneath 4 layers of sections and others on the top level) and each one varies in length (some are short paragraphs, others are large pieces of text). I've tried concatenating the text from all the html files into one document with limited success and I've also tried treating each 'leaf node' as its own document.
One think I do want to be able to do is reference which section was used as context for the answer so the user can follow a link to where the relevant docs.
Has anyone dealt with a file structure like this? Any suggestions on a method or things to try?
nb. to add to the complexity i also have a number of these main documents ('collections') i want to build the index over and also need to build an index over a few of these collections!