there's no perfect answer tbh. Unstructured.io tries to do this nicely, but I find it hit/miss
Ideal approaches detect tables/images/text and apply specific functions to properly parse them
I've also seen this, but haven't tried it yet
https://facebookresearch.github.io/nougat/