hello there, i have a question about enhancing data extraction quality from scanned documents over a typical rag + reranker pipeline. i am currently using LlamaParse to convert tabular data into markdown table, then indexing them. There can be a case where the tabular data are not being converted properly (e.g. table fonts are too small, document not scanned properly by people, etc), thus making the markdown table unusable. Since I am using gpt-4o in my pipeline,
questions:
- Can I also extract the table as an image and put them in my pipeline? So if markdown table us unusable, gpt-4o can also look into the image for data extraction
- Do I also have to manage how I chunk the markdown table and image in sequence if I have more than 1 table?