Find answers from the community

Updated 2 years ago

Sorry if this has been asked before and

Sorry if this has been asked before and I suppose that the answer depends on each use case but ...

Is it important to organize/split documents to feed to llama_index into chapters, sections, sub-sections, etc ? Is it important to use titles ? Sub-titles ? Are these features automatically detected based on the document format ... I suppose MS Word files and PDF could benefit of their specific paragraph styles ? What about plain TXT files ? What about markdown files ? Does the structure of the document have any importance at all ?
j
1 comment
It's a good question and I wouldn't say the answer has been solved yet. To some extent, it depends on what questions you want to ask over your data and how complicated your data is.

You could try playing around with different types of parsers before indexing them. We offer a bunch of these at https://llamahub.ai/. If you want to get more detailed parsing from each doc, check out Unstructured as well https://llamahub.ai/l/file-unstructured
Add a reply
Sign up and join the conversation on Discord