It's a good question and I wouldn't say the answer has been solved yet. To some extent, it depends on what questions you want to ask over your data and how complicated your data is.
You could try playing around with different types of parsers before indexing them. We offer a bunch of these at
https://llamahub.ai/. If you want to get more detailed parsing from each doc, check out Unstructured as well
https://llamahub.ai/l/file-unstructured